How fast is Ethereum blockchain growing and how big is it likely to get in the future? Is it still about 1 GB per month? Are there any data pruning or compression algorithms in place or planned to be used?
[Ethereum] What are the Ethereum disk space needs
blockchaindisk-spacestate-trie-pruningstatistics
Related Solutions
It is a similar concept to garbage collection in programming languages and in tree-based version control system like git. When ethereum contracts run, they modify their state. And since the state tree is an immutable append-only structure, it means that every time the state is modified, it gets a new state root. Some elements that were reachable from the old roots may be not be reachable from the new root (due to operations that delete or modify entries). Theoretically, they can be pruned (garbage collected). However, since the Proof-Of-Work consensus as it is does not define when state transitions are final, there is always a theoretical possibility that state will be reverted to older roots and things that were pruned would be needed again. So, pruning is currently a trade-off. We say, that, for example, after 5000 block we assume that the state won't be reverted and prune all unreachable nodes. Someone might still want to disable this feature to keep the entire history of the blockchain for special purposes.
Here is a very detailed description: https://blog.ethereum.org/2015/06/26/state-tree-pruning/
Lets take it one step at a time.
Blockchains generally work by having an origin (genesis) state with a few accounts having funds, and then every block that you place on top of the chain moves those origin funds around, also granting a bit of extra for the miner. So whenever you import a new block into your existing chain to take a look at what your view (state) of the world is, and transform that state according to the transactions contained in the block, arriving to a new view of what you believe the world looks like. You don't discard your past view of the world because if there is a fork in the blockchain (e.g. a miner turns up with a better block, or maybe two better blocks), then you need to transform your view from that past state to the better version. This leads to all past states that you transitioned through being accumulated for eternity. This is an unpruned state/blockchain, and is at 7GB currently for Ethereum.
The important thing to notice is that most of the times you don't care about how much fund an account had 3 years ago, you only care about what the state currently is (maybe a few days ago too). So why keep all that extremely old past transition state around? State pruning is essentially taking all that intermediate state, and flushing it down the toilet. The important thing to realize is that you only throw away the intermediate world view, never the blocks themselves or any other data that might be unhealthy for the network (i.e. a joining node needs that data to sync). Thus by pruning your state trie, you lose the ability to query the past balance of accounts, but at benefit of reducing your amount of stored data to aout 1/5-1/6 of its original size.
Ok, so what about fast sync? Well, following the previous thought pattern, if you don't care about the balance of a random account from 3 years ago, why would you want to replay the entire transaction history of the blockchain, just to get to the current state. So what fast sync does, is that it downloads all the blockchain, but it does not execute the transactions generating the world view one block at a time. Instead it only verifies the proof-of-works, and when the entire chain is downloaded it looks at the state root (hash defining the current world view) and downloads the state database directly from within the network, reconstructing the final state from the start, without needing the transient states for it. This means, that beside downloading the blocks, it needs to download additional data, the state trie itself, so it's exchanging bandwidth for processing power (i.e. I download the state, don't generate it). The end result of fast sync is a pruned database from all intents and purposes, just via a different means. The current size of such a database is 1.2-1.3GB.
Best Answer
Update on Dec 9th, 2018 / Block ~ 6_850_000 - It's quite an annoyance to keep this answer updated.
Geth (Go)
Last Update: May 14th, 2018 / Block ~ 5_600_000
[1]
My disk was full, I didn't expect this to run out of space and wasn't able to repeat this sync mode[2]
I didn't manage to sync the archive node within six weeks, unfortunately fully.Parity (Rust)
Last Update: May 14th, 2018 / Block ~ 5_600_000
Update: Nov 29th, 2017. Afri has written a blog post about this, esp. parity pruning modes: The Ethereum-blockchain size will not exceed 1TB anytime soon.