Calldata Ambiguity – Analyzing the Likelihood of Hash Collisions in Solidity

calldatahashkeccak

To call a function in a smart contract, a transaction (or a message call depending on where it originates from) is sent to the contract address with a data field that has the encoded function signature and parameters. This encoding is done by taking the first 4 bytes of the keccak256 hash of the function signature and appending to it is the hash of each argument padded with 0s to 32 bytes width.

If, for example, I want to call a function doSomething() that takes no arguments, the calldata would just be the first 4 bytes of the keccack256 hash of the signature:

keccak256('doSomething()') = 0x82692679

Now, since we only take the first 4 bytes isn't the possibility of overlap much higher now? How is this still maintaining cryptographic hash properties? I tried producing 2 function signatures with the same first 4 bytes (and I obviously couldn't) but my question is: how likely is it?

Best Answer

This is actually something that can happen. The possibility of it happening randomly in the same contract is very low, which is 1/2^32. But if someone wants to make this attack happen she can easily find two functions with the same signature. While it does happen though, the solidity compiler will refuse to compile any contract that has a function selector collision within the same contract. However, that is not the case with proxies.

This was the idea behind one of the vulnerabilities found in earlier proxy patterns, which is called signature clashing. If you want to read more about it, you can find more information here.