The Ethereum Foundation has the Unicorn token to encourage donations, and the unicorn emoji is represented by three bytes. How in the world did they do that?
I know that Solidity supports unicode escapes, so something like \u2934
in a string is displayed in Mist as โคด . But what about something like the puppy emoji ๐ถ?
One would think the escape sequence would be \u1F436
, but instead Mist just shows a character that isn't what I want, presumably whatever emoji has the encoding \u1F43
.
So then I tried with two unicode points: \uD83D\uDC36
. Mist didn't show anything.
The following commit makes me think this is impossible, because it seems to me like the for loop
has i
iterating through four characters, or two bytes for each escape sequence:
https://github.com/ethereum/solidity/pull/666/commits/aa4593cab3d60468e5ea4318012c5252ebbc7d13
And as noted, unicode points don't seem to work, or at least aren't displayed in Mist (yet Mist shows the Unicorn emoji).
How in the world do I support an emoji consisting of 3+ bytes?
EDIT:
0xcaff
found a bug in the way that Solidity decodes UTF-8 encoded bytes.
I've filed the issue on GitHub: https://github.com/ethereum/solidity/issues/2383
If someone has a fix, the bounty is theirs. Otherwise it will go to 0xcaff
.
EDIT II:
The issue has been closed, UTF-8 validation has been fixed (or at least improved) and changes have been merged: https://github.com/ethereum/solidity/pull/2386
Now you can use emojis in Solidity with something like:
string public constant working = hex"F09F90B6";
Using the UTF-8 encoded bytes that can be generated from sites like https://mothereff.in/utf-8.
Woohoo! ๐๐ฆ๐ถ
Best Answer
Unlike some of the comments suggest, the unicorn symbol ๐ฆ (
U+1F984
) is located in the contract's symbol name. You can check this by running the following in a web3 browser's console:Let's investigate this string:
Let's talk about UTF-8, the character encoding mist uses. According to FileFormat.info:
What we have here is unicode code point: 29412 (0x1F984). It looks like solidity only supports encoding codepoints between 0x0000 (0) and FFFF (65535) using the
\uNNNN
syntax. Typically languages allow encoding with over this amount using surrogate characters. Solidity doesn't seem to.No worries, we should be able to just put the hex encoding of the correct unicode sequence and the correct text should be rendered.
Contract:
Test (truffle):
Unfortunately it won't compile:
The error comes from here. It seems the validator chooses an incorrect value for
count
and stops too early. (isolation (original)). This seems to be a bug/missing feature in solidity. Hope this helps.