Your example shows storing an IPFS identity using it's alphanumeric encoding (Qm...
), which is the same Base58 encoding that Bitcoin uses. However, what it's representing at its core is a number (the hash). Storing the identifier in the Base58 format needs to be a String because it includes letters (and what actually gets saved is the ASCII code for each alphanumeric character in the identifier). That means you need 46 bytes to store QmWmyoMoctfbAaiEs2G46gpeUmhqFRDW6KWo64y5r581Vz
from your example.
However, that identifier can also be expressed in hexadecimal as 12207D5A99F603F231D53A4F39D1521F98D2E8BB279CF29BEBFD0687DC98458E7F89
, which is only 34 bytes long (takes 68 characters to write out in hexadecimal, since every two characters in hex is a byte of data).
But, both of those are greater than 32 bytes, which is the max fixed-size byte array, so they're going to need to use a dynamically-sized byte array to store (bytes
or string
, both of which are expensive, as you noted).
BUT, that IPFS hash is actually two concatenated pieces. It's a multihash identifier, so the first two bytes indicate the hash function being used and the size. 0x12
is sha2, 0x20
is 256-bits long. Currently, that's the only format IPFS uses, so you could just chop off the first two bytes, which leaves you with a 32-byte value, which is small enough to fit in a bytes32
fixed-size byte array, and you save some space there (and when retrieving either your contract can re-attach 0x1220
to the front of it, or your clients need to be smart enough to do that after retrieving the value).
If you want to make sure your code is future-proof, though, you probably want to save that hash function code and size, which you could combine with the hash as a struct:
struct Multihash {
bytes32 hash
uint8 hash_function
uint8 size
}
That will work with any multihash format, as long as size
is less than or equal to 32 (any bigger and the actual payload won't fit in the hash
property). This struct will take two storage slots (two 32-byte chunks) to store, since the two uint8
pieces can be put in one slot. You could also add up to 30 bytes of additional data to this struct without taking any more storage cost.
Best Answer
IPFS specifies that the hash should be a multihash. This consists of a byte indicating the hashing algorithm, followed by another byte for length. This is then base58-encoded.
The hashing algorithm is normally sha256, identified by 0x12, with a length of 0x20. This encodes as "Qm".
In theory other algorithms are legal in IPFS, so if you've got code you won't be able to change that needs to work in future you shouldn't assume the "Qm". You could detect some illegal cases by checking the length byte and seeing if it matches the length. There are some libraries listed here that will do this for you, or you can look at them to see how they work.
However I don't think other algorithms are commonly used at the moment, so if you're more worried about input mistakes than future compatibility you could probably just check for this format and update your code if people start using something else in serious numbers.