Event Logs Search – Is Searching Data Stored in Event Logs Prohibitively Slow?

eventsjson-rpclogsstorage

Augur uses event logs to store data that never needs to be accessed on-contract, since it is about 10x cheaper than on-contract storage. However, I've noticed that retrieving event logs (e.g., using eth_getLogs) takes significantly longer than retrieval of on-contract data. (I know filters let you listen for new event logs, but I'm specifically concerned with looking up old event logs.)

I don't really understand how event logging works under the hood. Currently, we're indexing one (or more) of the log's arguments, then using eth_getLogs to search on the indexed argument. For example, all Augur trades are logged and indexed by both market ID and account. Our UI looks up trades made by the current account as part of its initial loading. Right now this takes about 10 seconds to finish. The eth_getLogs request is made asynchronously, and interleaved with the other initial market loading tasks, so it's not overly annoying at the moment. However I'm concerned about how this will scale as both Augur and the Ethereum blockchain as a whole grow.

My question: will retrieving event logs become prohibitively slow as the blockchain becomes larger? More specifically, what is the time complexity of eth_getLogs? (Is blockchain size the relevant variable, or something else? Is eth_getLogs even the correct RPC request for what I'm trying to do? Basically any insight into this problem is appreciated!)

Best Answer

I will try to anwser your question. I worked on bloom filters in cpp-ethereum and Parity.

will retrieving event logs become prohibitively slow as the blockchain becomes larger?

Not necessarily. Everything depends on the implementation, logs density (average number of logs / block) and number of cache levels.

More specifically, what is the time complexity of eth_getLogs?

In worst case, where every block contains log matching your query it is 0(n). But it's rarely a case. Bloom filters utilize probability of false positives, so the more sophisticated your filter is (more topics it has), the faster you will get your results.

Is eth_getLogs even the correct RPC request for what I'm trying to do?

Yes

To summarise, I believe, that 10s response time is caused by sub-optimal imlementation of bloom filters in go-ethereum. Here are the results of benchmarks with parity:

Find all logs from block 0 to 986082 with address: 0x33990122638b9132ca29c723bdf037f1a891a70c (should return 1602 logs).
time curl -X POST --data '{"id":8,"jsonrpc":"2.0","method":"eth_getLogs","params":[{"fromBlock":"0x0","toBlock":"0xf0be2", "address": "0x33990122638b9132ca29c723bdf037f1a891a70c"}]}' -H "Content-Type: application/json" http://127.0.0.1:3030 >> /dev/null

geth first request:

real    0m17.003s

geth second request (I assumed, that results should be cached after the first one).

real    0m18.023s

parity first request (~24x faster then geth)

real    0m0.770s

parity second request (~30x faster then geth)

real    0m0.668s

The gap between Parity and geth closes dramatically when there are no logs to be found:

Find all logs from block 0 to 986082 with address: 0x33990122638b9132ca29c723bdf037f1a891a70d (address does not exist, 0 logs returned).
time curl -X POST --data '{"id":8,"jsonrpc":"2.0","method":"eth_getLogs","params":[{"fromBlock":"0x0","toBlock":"0xf0be2", "address": "0x33990122638b9132ca29c723bdf037f1a891a70d"}]}' -H "Content-Type: application/json" http://127.0.0.1:3030 >> /dev/null

geth first request:

real    0m0.022s

geth second request

real    0m0.021s

parity first request (4x slower than geth)

real    0m0.080s

parity second request (1.5x slower than geth)

real    0m0.030s
Related Topic