Contract Creation and Message Call - Ethereum Yellow Paper Walkthrough (5/7)

Posted Feb 12, 2020

By Lucas Saldanha

10 min read

Welcome back to another post in the Ethereum Yellow Paper Walkthrough series! At the end of the last post, I promised we’d take a closer look at contract creation and message call. That’s what this post covers.

After reading this post, you should know how a new contract account is created, where its address comes from, and what happens when a transaction (or another contract) calls into an existing account. This post covers sections 7 and 8 of the Yellow Paper.

If you missed any of the previous posts, here they are:

(DISCLAIMER: this post is based on the Byzantium version of the Yellow Paper, version 7e819ec from 20th October 2019)

Introduction

In the previous post we learned that, after validating a transaction, an Ethereum node executes it. Depending on the transaction shape, this execution will do one of three things:

Transfer Ether between two accounts.
Send a message call to a contract (e.g. invoking a method).
Deploy a new contract (a contract creation).

The first is the simplest and doesn’t require any EVM code. The other two are more interesting and the Yellow Paper dedicates an entire section to each of them. Let’s go through them one at a time.

Contract Creation

A contract creation transaction is a transaction whose to field is empty. When the node sees one of these, it knows that the goal is to create a new contract account in the world state.

Section 7 of the Yellow Paper defines the function Λ (capital lambda) that performs the contract creation. Let’s break it down in plain English.

Where does the contract address come from?

The first thing we need is an address for the new contract. We can’t ask the user to pick one because then anyone could squat on addresses they don’t own. The Yellow Paper specifies that the new contract address is derived deterministically from two values: the sender’s address and the sender’s nonce (the nonce before it gets incremented by the transaction).

In short, the address is the rightmost 20 bytes of Keccak-256(RLP(sender, nonce)):

address = KEC( RLP( sender, sender_nonce ) )[12:]

There are a few interesting implications here:

Since the sender nonce is incremented on every transaction, two contract creations from the same account produce different addresses.
The address can be computed before the transaction is included on chain. This is useful, for example, when you want to fund or whitelist a contract that hasn’t been deployed yet.
If somehow the computed address already exists in the world state (with code or a non-zero nonce), creation fails. This is extremely unlikely with a sane sender, but the rule is there to keep the protocol safe.

(The Constantinople fork introduced the CREATE2 opcode, which uses a different formula based on a user-provided salt. We won’t cover it here since it is not part of the Byzantium Yellow Paper.)

The new account state

Once we have the address, the node initialises the new account state. The interesting fields are:

nonce is set to 1 (in pre-Spurious Dragon forks this used to be 0, but the current Yellow Paper sets it to 1 to mitigate a class of attacks).
balance is set to the value field of the transaction, plus any pre-existing balance the address might already have. Yes, an address can have a non-zero balance before a contract is deployed there. Someone could have sent Ether to a computed address that hadn’t been used yet.
storageRoot is set to the hash of an empty trie.
codeHash is temporarily set to the hash of an empty string, but this will be overwritten once the init code finishes running.

The sender also pays for the contract creation: their balance is reduced by value, and their nonce is incremented.

Running the init code

So far we’ve only set up the account. The actual code that will live at the new address is not the init code. The init code is a kind of constructor: it runs once, can do any setup the contract needs (initialise storage, store a few values), and the bytes it returns are what end up stored as the contract’s code.

The Yellow Paper describes this with the EVM execution function Ξ (capital xi) that we’ll get to know in detail in the next post. For now, all we need to know is:

The EVM runs the init code in a fresh execution environment with the new contract’s address as the recipient.
If execution succeeds, the returned bytes become the contract’s code, and the codeHash field is set to the Keccak-256 hash of those bytes.
Paying for the code itself costs Gcodedeposit per byte (200 Wei per byte at the time of writing). If there isn’t enough gas left to pay for the code, the contract is created without code (it becomes a zombie account holding the value).
If execution fails (out of gas, invalid opcode, etc.), all state changes are rolled back and the gas is consumed by the miner. The contract is not created. As with message calls, an explicit REVERT is a special case: state is rolled back, but any remaining gas is returned to the sender.

The “rollback on failure” point is worth pausing on. From the outside, a failed contract creation looks like the contract was never deployed: any state changes the init code attempted are gone. The only thing that survives is the gas spent (which is paid to the miner).

Steps performed by the protocol when executing a contract creation transaction.

EIP-170 and the code size limit

One last detail before we move on: contract code size is capped at 24,576 bytes (this was introduced by EIP-170 in the Spurious Dragon fork). If the init code returns more bytes than this, creation fails. The cap is there to prevent denial-of-service attacks based on extremely large contracts being read repeatedly from disk.

Message Call

We just saw how new code gets deployed. Now let’s see how code at an existing address gets executed. If contract creation puts code at a new address, a message call runs code at an existing one (or, for an Ether transfer to an EOA, just moves value).

Section 8 of the Yellow Paper defines the function Θ (capital theta) that performs a message call. The function takes a lot of inputs and it can look intimidating, so let’s break it down.

Inputs of a message call

A message call needs to know:

The sender: who is sending this message. For a top-level transaction, this is the transaction signer. For a nested call (a contract calling another contract), this is the calling contract.
The transaction originator: the original signer of the transaction. This is what you read with tx.origin in Solidity. It does not change as the call stack deepens.
The recipient: the account receiving the message.
The account whose code is executed: usually this is the recipient, but with opcodes like CALLCODE or DELEGATECALL it can differ. This is what lets a contract execute code from another contract while staying in its own storage context.
The available gas: how much gas the call can spend before halting.
The gas price: set by the original transaction.
The value to transfer with the call.
The apparent value: almost always the same as the value, except for DELEGATECALL where it inherits the value of the parent call.
The input data: the bytes passed to the called function (in Solidity, the ABI-encoded arguments).
The current call depth: how deep we are in the call stack.
A flag indicating whether the call is allowed to modify state. STATICCALL sets this to false.

The Yellow Paper bundles most of these into the “execution environment” I that the EVM sees while running. More on I next time.

What happens during a message call

The high-level steps for a message call are:

Transfer the value from the sender to the recipient (if any). If the sender doesn’t have enough balance, the call fails before any code runs.
Look up the code at the account that owns the executed code.
If the recipient is a precompiled contract (addresses 1 to 8 in Byzantium), run the precompile. Otherwise, run the EVM with the looked-up code.
Collect the output (return data), the remaining gas, the accumulated sub-state (logs, refund counter, accounts marked for self-destruction) and a status (success or failure).
If the call failed, roll back any state changes it made. The remaining gas is not refunded to the parent if the call failed with an exceptional halt (e.g. out of gas, invalid opcode), but it is refunded on a REVERT.

The call depth limit is a hard 1024. Any call that would push the stack past this limit fails immediately. In practice, this puts a hard ceiling on recursive contract patterns.

Precompiled contracts

Some operations are common enough and expensive enough that implementing them in pure EVM bytecode would be prohibitive. To support these, the Yellow Paper reserves a handful of addresses for “precompiled” contracts. In Byzantium, these are addresses 1 through 8:

Address	Precompile	What it does
0x01	`ECRECOVER`	Recover an Ethereum address from an ECDSA signature
0x02	`SHA256`	Compute the SHA-256 hash of the input
0x03	`RIPEMD160`	Compute the RIPEMD-160 hash of the input
0x04	`IDENTITY`	Return the input unchanged (useful as a “memcpy”)
0x05	`MODEXP`	Modular exponentiation (used by some signature schemes)
0x06	`BN128_ADD`	Addition on the alt-bn128 elliptic curve
0x07	`BN128_MUL`	Scalar multiplication on alt-bn128
0x08	`BN128_PAIRING`	Pairing check on alt-bn128 (used by zk-SNARK verifiers)

When you call one of these addresses, the EVM doesn’t actually run bytecode. Instead, the client implementation runs the corresponding native function. Each precompile has its own gas cost formula (described in Appendix E of the Yellow Paper).

If you ever wondered how Solidity’s ecrecover is implemented, this is the answer: it is a call to the precompile at address 0x01.

Calls within calls

A message call can trigger more message calls. When a contract uses one of the call opcodes (CALL, CALLCODE, DELEGATECALL, STATICCALL), the EVM is suspended, a new sub-context is created using Θ, and execution resumes when the sub-call returns. The sub-call has its own gas allowance (a portion of the parent’s remaining gas, this is called the “63/64 rule”) and its own world state snapshot.

If the sub-call fails, only its changes are rolled back. The parent call can inspect the return status, decide what to do, and continue executing. This is what makes patterns like “try a call, fall back if it fails” possible.

Conclusion

In this post, we covered sections 7 and 8 of the Yellow Paper. We learned that:

Contract creation computes a new address from the sender and the sender’s nonce, initialises a new account state, runs the init code in the EVM, and stores the returned bytes as the contract’s code.
Message call is the way Ethereum runs code at an existing address. It transfers value, looks up the appropriate code, and either runs a precompile or executes the EVM with all the right context.

Both operations rely on a piece of machinery we haven’t really explained yet: the EVM itself. In the next post, we’ll look under the hood at section 9 of the Yellow Paper (the Execution Model). We’ll see what the stack, memory and storage look like, what the execution environment contains, and how the EVM cycle actually works.

As always, if you find anything wrong or unclear, please leave a comment. See you in the next one!

References

Programming, Ethereum Yellow Paper

ethereum blockchain

This post is licensed under CC BY 4.0 by the author.