Data Structures in the EVM

Understanding the data structures within the EVM is crucial for anyone aspiring to become a smart contract security researcher or perform code audits. These data structures include stack, memory, storage, and calldata, each serving a unique purpose in smart contract execution.

Stack

The stack in the EVM is a last-in, first-out (LIFO) structure used to hold temporary values. It supports operations such as pushing a new value onto the stack, duplicating the topmost value, swapping the top two values, and popping the topmost value off. The stack has a maximum size of 1024 elements, and attempting to push more than this limit results in an exception. Understanding the stack’s behavior is essential for auditing smart contracts, especially when reviewing the execution flow and temporary variable handling.

Memory

Memory in the EVM is a linear and expandable byte array. It is used to store temporary data that a smart contract function needs while executing. Memory is volatile and its contents are erased between external function calls. However, within a single transaction execution, memory remains intact across internal function calls. Memory expansion incurs gas costs, so efficient use of memory is important for optimizing smart contract performance.

Storage

Storage is a key-value store where each key and value is 32 bytes wide. It is the most expensive data location in terms of gas cost, but it is also the most durable, as data stored in storage persists between transactions. Each smart contract deployed on the Ethereum blockchain has its own storage space. Security researchers pay close attention to storage manipulation, as improper access control or vulnerabilities in the logic manipulating storage can lead to critical security issues, such as unauthorized fund access or contract takeover.

Calldata

Calldata is an immutable and non-persistent area where function arguments are stored. It is used to hold the data sent to the blockchain along with a transaction call, including the function identifier and parameters. Calldata is especially important for external functions, which are designed to be called by other contracts or transactions. Unlike memory, accessing calldata is cheaper in terms of gas, making it a preferred choice for passing large amounts of data to functions.

Implications for Security Research and Audits

Understanding these data structures is fundamental for conducting thorough security audits of smart contracts. A security researcher must be able to:

  • Analyze how a contract manages data across stack, memory, storage, and calldata.
  • Identify potential vulnerabilities resulting from improper data handling, such as stack underflows/overflows, out-of-gas errors due to inefficient memory use, or unauthorized storage modifications.
  • Assess the security implications of data location choices on gas costs and potential attack vectors, like reentrancy or denial of service (DoS) attacks.

In auditing smart contracts, a deep dive into the contract’s bytecode may be necessary to understand low-level data handling fully. This requires familiarity with the EVM’s instruction set and the ability to read and interpret assembly language.

Conclusion

Mastering the data structures in the EVM is a critical skill for smart contract security researchers. It allows them to identify vulnerabilities that could compromise the security, efficiency, and reliability of smart contracts. As the Ethereum ecosystem continues to evolve, staying updated with the latest EVM specifications and understanding the intricacies of its execution environment will remain essential for anyone involved in smart contract development, security research, and audits.