solidity – How to Decompile a Smart Contract

compilationevmopcodesolidity

On the blockchain I can inspect the code of a contract, and see the EVM opcodes. Is there a way to decompile this and convert it back to (Solidity) source code?

Best Answer

Compilation back to the original source code is impossible because all variable names, type names and even function names are removed. It might be technically possible to arrive at some source code that is similar to the original source code but that is very complicated, especially when the optimizer was used during compilation. I don't know of any tools that do more than converting bytecode to opcodes.

Since contracts can access their own code and thus (ab)use the code for storing data, it is not always clear whether some part of the code is actually used as code or only as mere data and whether it makes sense to try and decompile it. It is computationally undecidable whether some piece of the code is reachable or not.

Note that there is no dedicated area to store creation-time fixed data (like lookup tables, etc). Apart from the code of the contract, it would also be possible to store the data in storage, but that would be way more expensive, so putting such data in the code is actually a common thing.

Related Solutions

Solidity Contract Verification – How to Verify Blockchain Contract Matches Source Code

AFAIK the best way to do this at the moment is to compile the source code again with the exact same compiler version the author used (so this is something that needs to be disclosed) and to compare the bytecode.

So the match you should check is the compiled bytecode against the data of the contract creation tx.

Etherscan – Discrepancy Between Opcode View and Opcode Disassembler for USDT Contract

Your "opcode tool" link leads to a completely different address (0x9e1b57fc92eba6434251a8458811c32690f32c45). If you check opcodes for your original address, you'll see they're the same:

0xdac17f958d2ee523a2206206994597c13d831ec7 (code)

PUSH1 0x60
PUSH1 0x40
MSTORE
PUSH1 0x04
CALLDATASIZE
LT
PUSH2 0x0196
JUMPI
PUSH1 0x00
CALLDATALOAD
PUSH29 0x0100000000000000000000000000000000000000000000000000000000
SWAP1
DIV
...

0xdac17f958d2ee523a2206206994597c13d831ec7 (disassembler)

[1] PUSH1 0x60
[3] PUSH1 0x40
[4] MSTORE
[6] PUSH1 0x04
[7] CALLDATASIZE
[8] LT
[11] PUSH2 0x0196
[12] JUMPI
[14] PUSH1 0x00
[15] CALLDATALOAD
[45] PUSH29 0x0100000000000000000000000000000000000000000000000000000000
[46] SWAP1
[47] DIV 
...

By the way, if you know any docs on compiled smart contract structure it'd be nice to share, so I and everyone could understand it better.

Check out this multi-part article series from OpenZeppelin: Deconstructing a Solidity Contract —Part I: Introduction.

Note that this just describes the bytecode produced by the Solidity compiler. Currently EVM does not enforce any structure so different compilers could do it differently. All EVM does is start executing the binary blob at position 0 and go wherever the jumps take it. Some parts of the binary might never be executed - you can for example just append random junk to any valid bytecode and it will remain valid (and that part won't be executed because there can be no jumps to it). solc uses this fact to create code/data sections (sub-assemblies/sub-objects) and add metadata hash at the end. For example the runtime code to be deployed is a sub-assembly. If your contract deploys other contracts with new, the bytecode of each of these contracts also gets a seperate sub-assembly.

This free-form structure has its downsides and is going to change in the future. See EIP-3540: EVM Object Format (EOF) v1, which is a new standard that will make it more rigid.

Best Answer

Related Solutions

Solidity Contract Verification – How to Verify Blockchain Contract Matches Source Code

Etherscan – Discrepancy Between Opcode View and Opcode Disassembler for USDT Contract

Related Topic