[Ethereum] Is the size of the Ethereum blockchain a problem for full nodes in the coming months

blockchaingo-ethereum

In August 2017 the size of my chaindata directory was about 12GB. Now it's November and my chaindata directory is now 220GB – as per expectations with the growing popularity of Ethereum.

I run my geth node on Windows 10 with this command-line:

geth.exe --syncmode fast --cache 1024 --datadir D:\Ethereum\GethBlockchain

I had been running my node on my 2TB platter drive, however around late October my geth node simply couldn't keep-up with the new blocks: it was processing blocks it received at a slower rate than they were being produced, I assumed it was my internet connection or something else.

However I saw in Task Manager that my 2TB disk's Random IO was at a constant 100% usage. I swapped it out for a 500GB SSD (just SATA3, not NVME/U.2/M.2) and my disk Random IO dropped to about 20% and geth was able to process blocks fast enough again. It seems that Ethereum (or at least geth) really needs fast Random IO SSD drives in order to process new blocks (why? don't Merkel trees mean there's no need to query older blocks?).

Given that the blockchain is growing at a very fast rate such that it will probably hit 1TB within a few months (and then likely 2TB and onwards before halfway through 2018) this is a problem because that's hitting capacity limits of SSDs affordable by normal people – then you start needing special (read: expensive) setups comprised of either super-expensive high-capacity SSDs or a massive array (presumably concatenation/JBOD, no need for RAID necessarily) of SSDs just to store the blockchain. I don't like the idea of having to spend $200 every few months for another 500GB SSD – eventually I'll run out of SATA ports.

If this holds true, even without it continuing to accelerate, eventually the Ethereum blockchain will be impossible to store on a geth full-node running on commodity hardware: removing its democratization and putting its control only in the hands of those who can afford the hardware necessary to run a full-node.

I read that geth with --syncmode fast will prune the blockchain of unneeded extra data (e.g. intermediate results for smart-contracts) during its initial sync, but after the sync is finished then any new blocks processed will retain that extra data and that it is not currently possible to prune those new blocks.

This has been raised in the Geth/Mist repos on GitHub, the advice seems to be to delete the entire blockchain and run another --sync fast – but I'm finding that this becomes less and less feasible as time goes on. There is talk of this inability to prune/clean-up the blockchain store as a temporary inconvenience, so I hope it means they'll add support soon.

Why does geth perform so much random IO?
Is there any way to compress the chaindata directory contents at all to minimize disk usage?
What will happen when commodity hardware cannot be used to run a full-node? Do the Ethereum Project organizers have a plan for this eventuality?

Best Answer

Regarding 3, I don't think this will happen for a long time. There are other options for running a full node. I'd suggest looking at Parity, running a full node using that client with the "warp" feature only takes 20-30 GB.

For more detail as to why, I suggest reading this blog post:

The Ethereum Blockchain Size Will Not Exceed 1 TB Anytime Soon

Related Solutions

[Ethereum] Ways to manage disk space? (e.g., running geth/Mist in “full,” “fast” and/or “light”)

(1) What's the geth default blockchain sync type when installing via command line tools? Full?

Just tested with geth 1.5.6, the default is full.

(2) Do my settings for geth automatically transfer over to Mist -- and visa versa? (e.g., if I'm running geth --fast will Mist also refer to the same size blockchain?)

Yes, mist uses (in most cases) geth as Ethereum node, so if you run geth --fast, mist will work in fast mode. However, if you stop a geth --fast node, and restart it, it will resume in full mode as far as I remember. This means, after shutting down geth, and starting mist, it will start a full node. But that terminology is misleading in some cases, and you should probably read on here:

What is Geth's “fast” sync, and why is it faster?

(3) Is it possible to run Mist in "light" mode? I saw from this answer that Mist can be in "full" or "fast" sync but didn't see "light". I tried it, and it looks like I'm not getting any errors yet, but curious if there are known issues.

Light client was just very recently released and you should expect hiccups. The same goes for mist if you use a geth node in light mode. As if Ethereum Stack Exchange isn't awesome already, check out this post:

What is Geth's “light” sync, and why is it so fast?

(4) Is it possible to have multiple copies of the blockchain on my computer? For example, if I first set up geth using geth -full and then I run geth -light, will the light version overwrite the full version or will I need additional space to sync light?

Yes, that is possible. Running a full node, i.e.,

$ geth             #full node (default),

creates a full copy of the blockchain in ~/.ethereum/geth/chaindata/.

Running a light node, i.e.,

$ geth --light     #,

creates a directory for the state in ~/.ethereum/geth/lightchaindata/. To run both clients at the same time, you need some additional adjustments such as IPC path, ports, etc.

How to run two nodes on the same device?

However, if you want to run a --fast sync, this only works on the first run of geth. If you already synced the full chain, you will get a message like this if you run geth in fast mode:

I0112 21:09:00.024747 eth/handler.go:119] blockchain not empty, fast sync disabled

If you insist on keeping a full and a fast copy of the blockchain on the same device, you can use the --datadir switch.

[Ethereum] Why is Geth’s “fast” sync now the default, whereas before it wasn’t

The default is now "snap" sync which completes in about 4 or 5 hours for me on a SATA SSD.

Best Answer

Related Solutions

[Ethereum] Ways to manage disk space? (e.g., running geth/Mist in “full,” “fast” and/or “light”)

[Ethereum] Why is Geth’s “fast” sync now the default, whereas before it wasn’t

Related Topic