Solidity – How to Store and Retrieve Strings Over 32 Bytes Using Hashed Slots

assemblysolidity

I have a desired storage slot:

bytes32 public slot = keccak256(abi.encodePacked("mydesiredStorageslot")); 

I wish to store on this slot an url which, if not shorted, can be more than 32 bytes as well.

Assume the url is www.google.com/something-long-which-needs-to-be-stored/blah-blah-blah/blah-blah-blah/1.json

I need to write a code which essentially can store this string at this slot and also retrieve it if called.
I have written the following code which does not work for string which are more than 32 bytes. (My knowledge on assembly is not that great!). What is the best way to go about coding this?


contract Foo { 
    
    bytes32 public constant location = keccak256("mystoragelocation");
    bytes32 public constant location2 = keccak256(abi.encodePacked(location));
    
    function storeString(string memory _string) public {
        bytes32 _location = location;
        bytes32 _location2 = location2;
        assembly {
            sstore(_location, mload(_string)) // length
            sstore(_location2, mload(add(_string, 0x20))) // value
        }
    }
function boo() public view returns (string memory p) {
        bytes32 _location = location;
        bytes32 _location2 = positiontwo;
        assembly {
            p := mload(0x40)
            mstore(p, sload(_location))
            mstore(add(p, 0x20), sload(_location2))
            // set the pointer to free memory
            mstore(0x40, add(p, 0x40))
        }
    }
}

Best Answer

Well first you must be aware of the string storage layout :

If the string is less that 32 bytes long, it is all written in the length storage slot with byte 0 encoding the string length * 2 and all other 31 bytes used to store the string data.

If the string is more than 32 bytes long : the length storage slot holds only the value of string length * 2 + 1, and storage slot(s) starting at keccak256(length_slot) hold the data.

So you must be able to handle both cases, and translate that to assembly which could look more or less like this for both read and write :

The code is commented, but don't hesitate to ask if some parts are a bit confusing.

EDIT : I modified the code so that the assembly doesn't do a return from the tx, allowing you or anyone to call it from another function, but stays compatible with direct invocation from the tx.

// SPDX-License-Identifier: GPL-3.0
pragma solidity ^0.8.0;

    contract Foo { 
    
    bytes32 public constant length = keccak256("mystoragelocation");
    bytes32 public constant data = keccak256(abi.encodePacked(length));

    function storeString(string memory _string) public {
        bytes32 _length = length;
        bytes32 _data = data;

        assembly {
            let stringLength := mload(_string)

            switch gt(stringLength, 0x1F)

            // If string length <= 31 we store a short array
            // length storage variable layout : 
            // bytes 0 - 31 : string data
            // byte 32 : length * 2
            // data storage variable is UNUSED in this case
            case 0x00 {
                sstore(_length, or(mload(add(_string, 0x20)), mul(stringLength, 2)))
            }

            // If string length > 31 we store a long array
            // length storage variable layout :
            // bytes 0 - 32 : length * 2 + 1
            // data storage layout :
            // bytes 0 - 32 : string data
            // If more than 32 bytes are required for the string we write them
            // to the slot(s) following the slot of the data storage variable
            case 0x01 {
                 // Store length * 2 + 1 at slot length
                sstore(_length, add(mul(stringLength, 2), 1))

                // Then store the string content by blocks of 32 bytes
                for {let i:= 0} lt(mul(i, 0x20), stringLength) {i := add(i, 0x01)} {
                    sstore(add(_data, i), mload(add(_string, mul(add(i, 1), 0x20))))
                }
            }
        }

    }

    function readString() public view returns (string memory returnBuffer) {
        bytes32 _length = length;
        bytes32 _data = data;

        assembly {
            let stringLength := sload(_length)

            // Check if what type of array we are dealing with
            // The return array will need to be taken from STORAGE
            // respecting the STORAGE layout of string, but rebuilt
            // in MEMORY according to the MEMORY layout of string.
            switch and(stringLength, 0x01)

            // Short array
            case 0x00 {
                let decodedStringLength := div(and(stringLength, 0xFF), 2)

                // Add length in first 32 byte slot 
                mstore(returnBuffer, decodedStringLength)
                mstore(add(returnBuffer, 0x20), and(stringLength, not(0xFF)))
                mstore(0x40, add(returnBuffer, 0x40))
            }

            // Long array
            case 0x01 {
                let decodedStringLength := div(stringLength, 2)
                let i := 0

                mstore(returnBuffer, decodedStringLength)
                
                // Write to memory as many blocks of 32 bytes as necessary taken from data storage variable slot + i
                for {} lt(mul(i, 0x20), decodedStringLength) {i := add(i, 0x01)} {
                    mstore(add(add(returnBuffer, 0x20), mul(i, 0x20)), sload(add(_data, i)))
                }

                mstore(0x40, add(returnBuffer, add(0x20, mul(i, 0x20))))
            }
        }
    }
}

I hope that answers your question.

EDIT : Answering @Kunal Jadhav's comments.

Can you explain the case 0x00 code block of readString() in detail?

Sure, in that case we need to read a short array (i.e., up to 31 bytes of data and 1 byte of length packed together in a single slot) and then write it according to the memory layout of strings where length and data are separated.

So let's take your example :

"hello"="0x68656c6c6f00000000000000000000000000000000000000000000000000000a"

As this is a short string, you see that byte 0 is length * 2 (0x0a is 10, or 5 * 2) all other bytes are the string data.

let decodedStringLength := div(and(stringLength, 0xFF), 2)

Just takes the last byte of the storage slot with and(stringLength, 0xFF) and divides it by 2 to get the true length of the string as this is how it should be written in memory.

The next line just stores that value as the first element of the string memory that we will return :

mstore(returnBuffer, decodedStringLength)

Next we need to extract the data, the 31 higher bytes in that case. This is done with :

mstore(add(returnBuffer, 0x20), and(stringLength, not(0xFF)))

Breaking it down, add(returnBuffer, 0x20) just skips the length field that we wrote previously to start writing data where it will be expected to be for a memory string. The data extraction is done with and(stringLength, not(0xFF)) which is equivalent to and(stringLength, 0xFFFF...FFFF00). This simply selects the upper 31 bytes of the slot, and discards the last one that is storing the length, this value is stored to `returnBuffer + 0x20, completing the string memory layout.

One more question, in the storeString fn, for case 0x00,I didnt understand why you need to load 32 bytes starting from address(_string+32). Is generally the first 32 bytes of string in "MEMORY" holding only the length and next 32 bytes the value?

Once again, the memory layour is : 32 bytes for length, n * 32 bytes for data. Case 0x00 is for short array, so for a storage short array we will need to write both data and length * 2 in the same slot.

sstore(_length, or(mload(add(_string, 0x20)), mul(stringLength, 2)))

Where mload(add(_string, 0x20) selects the data of the string in memory which can only be up to 31 bytes long since we are in case 0x00, mul(stringLength, 2) simply computes length * 2, following the storage layout of short strings. or(mload(add(_string, 0x20)), mul(stringLength, 2)) combines those 2 values into a single 32 byte values where byte 0 is length * 2 , and all others are the string data.

The wrapping sstore writes this value in the length storage slot. I agree that it can be confusing to write data in the length field, but this is how it's done for short string arrays.

As per your comment : ADD(mload(add(_string, 0x20)), mul(stringLength, 2))) would work too, I just prefer to use binary operators when working with bits, and arithmetic operators when working with numbers. But in that case both are equivalent.

Related Topic