[Ethereum] Index and query blockchain data and levelDB

leveldbmysqlquery

Would it be more performant to query the blockchain data directly through the underlying database levelDB instead of using the provided geth api? How would it be possible? Can I just index the transactions by sender / receiver in mysql or elasticsearch? I saw vulcanizeDB doing this

Best Answer

Geth does three operations on the database that you should consider when trying to find alternatives.

Geth loads the entire database in levelDB (what I believe you want to do).
Geth then deciphers all (or required) key-value pairs using RLP.
Geth then takes these key-value pairs (which are nodes on a merle patricia trie) and creates merkle patricia tries` to understand the state of the blockchain.

LevelDB was not a challenge to get working. You can just grab any repo implementing Google's levelDB in your preferred language.

If you were going to tackle this problem, you would have to accomplish all three of these tasks to access the data on the blockchain. I have tried creating this using my own programming knowledge, but fail to accurately decode complex RLP using java (I'd recommend using python) since RLP structures could be varying in complexity, so a fixed array of one type created in java is out of the question. There are a few packages out there on GitHub that tries to decode RLP to some degree.

Forming the merkle patricia tries is challenging as well. Preferably you would identify the root of the tree. Each node (unless a leaf node) will contain references to another location in the database. The roots of each tree are contained in the block header, and the block header's location start with a special sequence of characters (if you search all values in a small levelDB for ethereum, you will see these special cases). Then the traversal of the tree is each account's public address.

I don't know what efficiency boosts you would get from creating your own solution, this greatly depends on how true you are to the yellow paper and which language you use. If you have a special case you want to use this database for (IE searching for smart contracts, or inputs) and will implement only certain aspects of this case into your program, then I am sure you will see an increase in performance with your non-complete Geth implementation.

Warning: this is not exactly how Ethereum works, but it gives you an idea. Think of the data on the blockchain as strings of information. You need all the characters to produce any string of information on the blockchain. Ethereum would then store each character in a HUGE database. If you decode any character you will see that each one points to another, if you follow the path you can decode one string. The issue is you want to know where "dog" is stored in the blockchain, but that is actually several locations. If you know where it starts (a block header) then you can construct the information by using steps 2 and 3.

I have tried to understand these concepts myself, so if I am off, or need to clarify anything please ask! I wish you luck in pursuing this if I haven't deterred you with my lengthy response!

Related Solutions

Private Blockchain – Listing Transactions in a Private Blockchain

Take a look at web3.ETH.filter

Parameters

String|Object - The string "latest" or "pending" to watch for changes in the latest block or pending transactions respectively. Or a filter options object as follows:

String|Object - The string "latest" or "pending" to watch for changes in the latest block or pending transactions respectively. Or a filter options object as follows:
- fromBlock: Number|String - The number of the earliest block (latest may be given to mean the most recent and pending currently mining, block). By default latest.
- toBlock: Number|String - The number of the latest block (latest may be given to mean the most recent and pending currently mining, block). By default latest.
- address: String - An address or a list of addresses to only get logs from particular account(s).
- topics: Array of Strings - An array of values which must each appear in the log entries. The order is important, if you want to leave topics out use null, e.g. [null, '0x00...']. You can also pass another array for each topic with options for that topic e.g. [null, ['option1', 'option2']]

EG:

// watch for changes 

var filter = web3.ETH.filter({address: "0xYOURADDRESS"});

filter.watch(function(error, result){
  if (!error)
    console.log(result);
});

Link for the script (not filter) to get the transactions to/from an account.

Issue in the Go-ethereum github to implement this functionality (eth.listTransactions) by default.

Latest update from Ethereum Developers:

Locking this issue but leaving it open.

We're considering a suitable implementation for this feature.

No ETA.

[Ethereum] How to access Geth’s state trie

Doing this using web3 over RPC will take forever, as you've found. Reading the .ldb files is the way to go, and is something that's come up in similar questions before.

I don't know of any working, up-to-date, .ldb parser, though there must be some in existence, so I'll defer to other more knowledgeable answers you might get.

In the meantime...

See: LDB files reading, which attempts to point the OP in the direction of parts of Geth's state trie (Go) API, with a view to writing a parser from scratch. (There's also an old Go parser, here, though it's likely out of date.)

To get a better understanding of how the database is laid out, have a read of Exploring Ethereum's state trie With Node.js. The example code is quite old, but can be made to work with a few tweaks.

Also related, though with no clear answer: How to parse blocks with Python?

Also related, specifically to the RPL encoding within the .ldb files: Format of LevelDB files in nodes directory? Trouble pulling contents with python leveldb API

Best Answer

Related Solutions

Private Blockchain – Listing Transactions in a Private Blockchain

[Ethereum] How to access Geth’s state trie

Related Topic