EVM – Understanding Confusion Around mload Opcode in EVM and Yul

evmyul

This code is used to return keccak256 of function input and taken from this question:

function assemblyKeccak (bytes memory _input) public pure returns (bytes32 x) {
    assembly {
        x := keccak256(add(_input, 0x20), mload(_input))
    }
}

My question is:

mload opcode usually takes memory address and loads what is stored there (such as free memory pointer mload(0x40)). In this case, it takes byte as an input… what is it doing here and what is it loading?

add adds two input and returns result. How is it possible to add _input with 32?

Best Answer

There are 2 points that you should clearly understand in assembly context :

1 . All variables are value type in assembly

There is no such thing as a "reference type" in assembly, for instance _input in assembly context is the address of the byte array, it's not the byte array itself like in pure solidity. Considering that value as an address (i.e., a pointer) or anything else is just a matter of interpretation, it is by no means enforced by the language.

2 . Memory arrays have the following layout

The length of the array is stored at it's address (in that case, the value of _input is just the address where you will find the length of the array encoded on 32 bytes), the following memory addresses will contain the data on as much 32 bytes words as necessary.

You can read more about it in the documentation.

For a visual explanation, this would be the actual memory layout of the _input array if it were composed of 33 bytes each with value 0x01.

So :

mload(_input) loads the 32 byte value contained at the address _input (the value of _input is an address, the value of memory[_input] is the length of the _input solidity array.)

add(_input, 0x20) takes the address _input and adds 0x20 (32) to skip the 32 byte length field, the result is the address at which the data is actually starting in memory. Think of it as the address of _input[0] if you want.

keccak256 (SHA3) requires both the offset of the data to hash and its length. The offset is just the memory address where the data is starting (add(_input, 0x20)) and it's length is mload(_input) as we have seen just before.

I hope that answers your question. Don't hesitate to ask for precisions if anything is unclear.

Related Solutions

Solidity Log Events – How to Split Calldata Bytes to Multiple Log Events?

Arbitrary Split approach

Edit: Added this section to address arbitrary split destinations

By the time you are splitting, you should know the length of the target data. Below is an example implementation of copying the bytes to their destination inside solidity, which should be trivially extendable to N buckets.

pragma solidity ^0.4.15;

contract HelpLogs {

  event LogFirstHalf(bytes _data);
  event LogSecondHalf(bytes _data);

  function logit(bytes data) external {
      uint midpoint = data.length / 2;
      bytes memory data1 = new bytes(midpoint);
      for (uint i = 0; i < midpoint; i++) {
          data1[i] = data[i];
      }
      bytes memory data2 = new bytes(data.length - midpoint);
      for (i = 0; i < data.length - midpoint; i++) {
          data2[i] = data[i + midpoint];
      }
      LogFirstHalf(data1);
      LogSecondHalf(data2);
  }
}

Note that the gas usage is higher than it needs to be, because it works byte-by-byte. It would be faster to use 32-byte words, with bitmasking. A good reference is memcpy from Arachnid's solidity-string utils library.

[Edit: old] Fixed Bucket Approach

You can split the data externally without the 21 kgas overhead. Send the pre-split data as two parameters to a single function call:

pragma solidity ^0.4.15;

contract HelpLogs {

  event LogFirstHalf(bytes _data);
  event LogSecondHalf(bytes _data);

  function logit(bytes dataPart1, bytes dataPart2) external {
    LogFirstHalf(dataPart1);
    LogSecondHalf(dataPart2);
  }
}

This will cost less gas than splitting inside the EVM.

Solidity Tutorial – Execute Raw Bytecode in EVM

Surely, you may do this. Here are high-level steps:

To the end of your bytecode append a few additional opcodes that copy stack and storage content into memory and return them together with memory content.
Prepend you bytecode with a simple constructor that will just deploy your bytecode as a contract.
Deploy modified bytecode and obtain deployed smart contract address.
Call just deployed smart contract. This effectively will execute original bytecode and return stack, storage, and memory content after the execution.

The challanges here are how to determine stack depth and how to find out modified storage keys. May you make the bytecode being executed to follow some convention about these things, such as leave stack depth and storage keys on the stack after execution?

If you need to execute each bytecode only once, you may add SELFDESTRUCT opcode before RETURN in order to destroy the smart contract after the call.

Best Answer

Related Solutions

Solidity Log Events – How to Split Calldata Bytes to Multiple Log Events?

Solidity Tutorial – Execute Raw Bytecode in EVM

Related Topic