Solidity Contract Development – How to Store IPFS Hash Using bytes32

bytescontract-developmentipfsmemorysolidity

Following Q/A (What datatype should I use for an IPFS address hash?) recommend us to use bytes to store IPFS hash.

I was using following example(https://github.com/AdrianClv/ethereum-ipfs/blob/master/NotSoSimpleStorage.sol), which uses string to store IPFS hash that costs around 110,000 gas price, which seems pretty expensive.

[Q] Does using bytes instead of string in order to store IPFS hash cost cheaper? I observe that storing bytes instead of string costs very close to string (110,000 gas). Since both datatypes storage seems expensive should I use events to store them?

Is there any example/tutorial related to store IPFS hash using bytes?

Would this work:

myContract.insertHash("QmWmyoMoctfbAaiEs2G46gpeUmhqFRDW6KWo64y5r581Vz");

contract Example_bytes {
    bytes[] list;
    function insertHash(bytes ipfsHash) {
       list.push(ipfsHash); //costs around 110,000 gas. 
    }
}

contract Example_string {
    struct hashes{
         string hash;
    }

    hashes[] list;
    function insertHash(string ipfsHash) {
       list.push(hashes{hash: ipfsHash); //costs around 110,000 gas. 
    }
}

Best Answer

Your example shows storing an IPFS identity using it's alphanumeric encoding (Qm...), which is the same Base58 encoding that Bitcoin uses. However, what it's representing at its core is a number (the hash). Storing the identifier in the Base58 format needs to be a String because it includes letters (and what actually gets saved is the ASCII code for each alphanumeric character in the identifier). That means you need 46 bytes to store QmWmyoMoctfbAaiEs2G46gpeUmhqFRDW6KWo64y5r581Vz from your example.

However, that identifier can also be expressed in hexadecimal as 12207D5A99F603F231D53A4F39D1521F98D2E8BB279CF29BEBFD0687DC98458E7F89, which is only 34 bytes long (takes 68 characters to write out in hexadecimal, since every two characters in hex is a byte of data).

But, both of those are greater than 32 bytes, which is the max fixed-size byte array, so they're going to need to use a dynamically-sized byte array to store (bytes or string, both of which are expensive, as you noted).

BUT, that IPFS hash is actually two concatenated pieces. It's a multihash identifier, so the first two bytes indicate the hash function being used and the size. 0x12 is sha2, 0x20 is 256-bits long. Currently, that's the only format IPFS uses, so you could just chop off the first two bytes, which leaves you with a 32-byte value, which is small enough to fit in a bytes32 fixed-size byte array, and you save some space there (and when retrieving either your contract can re-attach 0x1220 to the front of it, or your clients need to be smart enough to do that after retrieving the value).

If you want to make sure your code is future-proof, though, you probably want to save that hash function code and size, which you could combine with the hash as a struct:

struct Multihash {
  bytes32 hash
  uint8 hash_function
  uint8 size
}

That will work with any multihash format, as long as size is less than or equal to 32 (any bigger and the actual payload won't fit in the hash property). This struct will take two storage slots (two 32-byte chunks) to store, since the two uint8 pieces can be put in one slot. You could also add up to 30 bytes of additional data to this struct without taking any more storage cost.

Related Topic