EVM – Understanding Confusion Around mload Opcode in EVM and Yul

evmyul

This code is used to return keccak256 of function input and taken from this question:

function assemblyKeccak (bytes memory _input) public pure returns (bytes32 x) {
    assembly {
        x := keccak256(add(_input, 0x20), mload(_input))
    }
}

My question is:

mload opcode usually takes memory address and loads what is stored there (such as free memory pointer mload(0x40)). In this case, it takes byte as an input… what is it doing here and what is it loading?

add adds two input and returns result. How is it possible to add _input with 32?

Best Answer

There are 2 points that you should clearly understand in assembly context :

1 . All variables are value type in assembly

There is no such thing as a "reference type" in assembly, for instance _input in assembly context is the address of the byte array, it's not the byte array itself like in pure solidity. Considering that value as an address (i.e., a pointer) or anything else is just a matter of interpretation, it is by no means enforced by the language.

2 . Memory arrays have the following layout

The length of the array is stored at it's address (in that case, the value of _input is just the address where you will find the length of the array encoded on 32 bytes), the following memory addresses will contain the data on as much 32 bytes words as necessary.

You can read more about it in the documentation.

For a visual explanation, this would be the actual memory layout of the _input array if it were composed of 33 bytes each with value 0x01.

enter image description here


So :

mload(_input) loads the 32 byte value contained at the address _input (the value of _input is an address, the value of memory[_input] is the length of the _input solidity array.)

add(_input, 0x20) takes the address _input and adds 0x20 (32) to skip the 32 byte length field, the result is the address at which the data is actually starting in memory. Think of it as the address of _input[0] if you want.

keccak256 (SHA3) requires both the offset of the data to hash and its length. The offset is just the memory address where the data is starting (add(_input, 0x20)) and it's length is mload(_input) as we have seen just before.

I hope that answers your question. Don't hesitate to ask for precisions if anything is unclear.