[Ethereum] What actually is a DAG


I have tried googling it as well as finding it here.

Many people here talk about it, but what actually is a DAG?

Best Answer

DAG stands for Directed Acyclic Graph. In Ethereum, a DAG is created every epoch using a version of the Dagger-Hashimoto Algorithm combining Vitalik Buterin's Dagger algorithm and Thaddeus Dryja's Hashimoto algorithm.

There are quite a few places where the DAG is defined in the docs and literature. These are collated below:

From the yellow paper:

...d being the current DAG, a large data set needed to compute the mix-hash...

From Wikipedia:

Directed Acyclic Graph: image credit David Eppstein

enter image description here

In mathematics and computer science, a directed acyclic graph (DAG), is a finite directed graph with no directed cycles. That is, it consists of finitely many vertices and edges, with each edge directed from one vertex to another, such that there is no way to start at any vertex v and follow a consistently-directed sequence of edges that eventually loops back to v again. Equivalently, a DAG is a directed graph that has a topological ordering, a sequence of the vertices such that every edge is directed from earlier to later in the sequence.

From https://github.com/ethereum/wiki/wiki/Ethash-DAG:

...a great huge dataset known as the DAG...

The Ethash algorithm expects the DAG as a two-dimensional array of uint32s (4-byte unsigned ints), with dimension (n × 16) where n is a large number. (n starts at 16777186 and grows from there.) Following the magic number, the rows of the DAG should be written sequentially into the file, with no delimiter between rows and each unint32 encoded in little-endian format.

From Vitalik Buterin's (I think) Dagger Paper, Dec 2013:

Dagger, a memory-hard proof of work based on moderately connected directed acyclic graphs (DAGs, hence the name), which, while far from optimal, has much stronger memory-hardness properties than anything else in use today.

Essentially, the Dagger algorithm works by creating a directed acyclic graph (the technical term for a tree where each node is allowed to have multiple parents) with ten levels including the root and a total of 2^25 - 1 values.

From https://github.com/ethereum/wiki/wiki/Mining#so-what-is-mining-anyway:

...calculating the PoW (Proof of Work) requires subsets of a fixed resource dependent on the nonce and block header. This resource (a few gigabyte size data) is called a DAG. The DAG is totally different every 30000 blocks (a 100 hour window, called an epoch) and takes a while to generate.

From https://github.com/ethereum/wiki/wiki/Mining#ethash-dag

a DAG (directed acyclic graph) for the proof of work algorithm

From https://github.com/ethereum/wiki/wiki/Mining#the-algorithm

a large, transient, randomly generated dataset

From https://github.com/ethereum/wiki/wiki/Ethash

The DAG is the "dataset" in this description of the Ethash algorithm, emphasis mine:

  1. There exists a seed which can be computed for each block by scanning through the block headers up until that point.
  2. From the seed, one can compute a 16 MB pseudorandom cache. Light clients store the cache.
  3. From the cache, we can generate a 1 GB dataset, with the property that each item in the dataset depends on only a small number of items from the cache. Full clients and miners store the dataset. The dataset grows linearly with time.
  4. Mining involves grabbing random slices of the dataset and hashing them together. Verification can be done with low memory by using the cache to regenerate the specific pieces of the dataset that you need, so you only need to store the cache.