Event Logs Search – Is Searching Data Stored in Event Logs Prohibitively Slow?

eventsjson-rpclogsstorage

Augur uses event logs to store data that never needs to be accessed on-contract, since it is about 10x cheaper than on-contract storage. However, I've noticed that retrieving event logs (e.g., using eth_getLogs) takes significantly longer than retrieval of on-contract data. (I know filters let you listen for new event logs, but I'm specifically concerned with looking up old event logs.)

I don't really understand how event logging works under the hood. Currently, we're indexing one (or more) of the log's arguments, then using eth_getLogs to search on the indexed argument. For example, all Augur trades are logged and indexed by both market ID and account. Our UI looks up trades made by the current account as part of its initial loading. Right now this takes about 10 seconds to finish. The eth_getLogs request is made asynchronously, and interleaved with the other initial market loading tasks, so it's not overly annoying at the moment. However I'm concerned about how this will scale as both Augur and the Ethereum blockchain as a whole grow.

My question: will retrieving event logs become prohibitively slow as the blockchain becomes larger? More specifically, what is the time complexity of eth_getLogs? (Is blockchain size the relevant variable, or something else? Is eth_getLogs even the correct RPC request for what I'm trying to do? Basically any insight into this problem is appreciated!)

Best Answer

I will try to anwser your question. I worked on bloom filters in cpp-ethereum and Parity.

will retrieving event logs become prohibitively slow as the blockchain becomes larger?

Not necessarily. Everything depends on the implementation, logs density (average number of logs / block) and number of cache levels.

More specifically, what is the time complexity of eth_getLogs?

In worst case, where every block contains log matching your query it is 0(n). But it's rarely a case. Bloom filters utilize probability of false positives, so the more sophisticated your filter is (more topics it has), the faster you will get your results.

Is eth_getLogs even the correct RPC request for what I'm trying to do?

Yes

To summarise, I believe, that 10s response time is caused by sub-optimal imlementation of bloom filters in go-ethereum. Here are the results of benchmarks with parity:

Find all logs from block 0 to 986082 with address: 0x33990122638b9132ca29c723bdf037f1a891a70c (should return 1602 logs).

time curl -X POST --data '{"id":8,"jsonrpc":"2.0","method":"eth_getLogs","params":[{"fromBlock":"0x0","toBlock":"0xf0be2", "address": "0x33990122638b9132ca29c723bdf037f1a891a70c"}]}' -H "Content-Type: application/json" http://127.0.0.1:3030 >> /dev/null

geth first request:

real    0m17.003s

geth second request (I assumed, that results should be cached after the first one).

real    0m18.023s

parity first request (~24x faster then geth)

real    0m0.770s

parity second request (~30x faster then geth)

real    0m0.668s

The gap between Parity and geth closes dramatically when there are no logs to be found:

Find all logs from block 0 to 986082 with address: 0x33990122638b9132ca29c723bdf037f1a891a70d (address does not exist, 0 logs returned).

time curl -X POST --data '{"id":8,"jsonrpc":"2.0","method":"eth_getLogs","params":[{"fromBlock":"0x0","toBlock":"0xf0be2", "address": "0x33990122638b9132ca29c723bdf037f1a891a70d"}]}' -H "Content-Type: application/json" http://127.0.0.1:3030 >> /dev/null

geth first request:

real    0m0.022s

geth second request

real    0m0.021s

parity first request (4x slower than geth)

real    0m0.080s

parity second request (1.5x slower than geth)

real    0m0.030s

Related Solutions

[Ethereum] Where do contract event logs get stored in the Ethereum architecture

Logs are part of the transaction receipts. They are generated by the clients when executing transactions and stored alongside the blockchain to allow retrieving them.

Logs are not part of the blockchain itself per se, since they are not required for consensus (they are just historical data), however they are verified by the blockchain as the transaction receipt hashes are stored inside the blocks.

Truffle Testing – Logs Not Showing Emitted Events in JavaScript

Turns out transaction receipt logs only include events emitted in the context of the direct contract function being called. If the called function makes another call to a separate external contract that emits an event, those won't be included even if there are emitted.

To make use of those emitted events from other contracts:

const sha3 = require('js-sha3').keccak_256
...

const tx = await instance.someFunction(();
let event = tx.receipt.logs.some(l => { return l.topics[0] == '0x' + sha3("Stored()") });
assert.ok(event, "Stored event not emitted");

where someFunction is a function that calls a function of another contract different to instance.

This grabs the logs from the receipt object. The logs should contain topics, the first of which is a hashed event signature. Thus we can use what we expect our event signature to be, hash it and test for existence in the logs. This works because all events are included in receipt logs but are not returned in the transaction logs provided by truffle/ganache.

This post helped.

On later versions of Truffle (have tested with v5.1.0), the same logs and topics are found under a different transaction receipt property rawLogs:

const tx = await instance.someFunction(();
let event = tx.receipt.rawLogs.some(l => { return l.topics[0] == '0x' + sha3("Stored()") });
assert.ok(event, "Stored event not emitted");

Best Answer

Related Solutions

[Ethereum] Where do contract event logs get stored in the Ethereum architecture

Truffle Testing – Logs Not Showing Emitted Events in JavaScript

Related Topic