Go-Ethereum – How Many NFTs are Hosted on IPFS? Complete Analysis

ethereum-classicethereumjsgo-ethereumnftsmart-contract-wallets

Let me start by saying that I am a NFT noob so I apologize ahead of time if any of this is obvious.

I'm looking to determine how many NFTs are "hosted" on IPFS as opposed to on centralized web servers, i.e. how many of the tokenURIs start with ipfs:// vs https://. The post here does a great job of outlining the manual steps you can do for a specific contract address to get the tokenURI. I'd like to be able to do this for the complete set of NFTs in an efficient way.

In my searching, I've come across a very cool project, ethereum-etl that even has a publicly available instance of Big Query with the Ethereum blockchain. Unfortunately, it doesn't look like the tokenURI is available in any of these database tables.

Is my best bet to modify the ethereum-etl or write a script that calls the Big Query instance and then use web3.js? Specifically, it looks like it is pretty feasible to get the tokenURI from the contract address using something like:

const nftAddress = 'YOUR_ADDRESS';
const contract = new web3.eth.Contract(ERC721ABI, nftAddress);
const tokenURI = await contract.methods.tokenURI(id).call();

(this is an adaptation of the code I found here)

Surely, it would take a while to run a script to do this over the complete set of NFTs, right? Is there a better way that is more out of the box?

Update: I created this sandbox which uses web3.js to get the tokenURI for a given NFT.

Best Answer

What you're doing is probably your best bet! There is no on-chain "registry" of smart contracts that let you track their existence.

Here are a few paths you might be able to take.

1. Find a dataset that contains the information you want: this is pretty self-explanatory, ethereum-etl is cool but apparently their Google BigQuery instance doesn't have all the data you want. Maybe someone has a dataset out there that does? Probably not (it's niche, dynamic data) but might be worth a shot. You might want to take a look at the Nansen or Dune platforms. Keep in mind that even if you do find a dataset, it'll probably not be up-to-date since the baseURI/tokenURIs can change.

2. Query the baseURI/tokenURI for every ERC-721/ERC-1155 contract you're interested in, going through a third-party Ethereum API: this probably most closely approximates what you're doing. You're getting a list of all the contracts you're interested in and now you might want to query that function on each one. This will get you where you need to go, but it might be slow since you need to hit some API that will undoubtedly rate-limit you unless you're paying. But this will get you accurate information. You can consider sampling e.g. only 1% of contracts created in the last year.

3. Query the baseURI/tokenURI for every ERC-721/ERC-1155 contract you're interested in, querying a locally-hosted node: this is pretty extreme. Ethereum nodes are quite large nowadays and this may not be practical. Worth putting it out there though. If you don't want to be rate-limited/bottlenecked by a third-party API, you can host the data yourself and run the script configured to query your own node.

If I were you, I'd probably go with option 2 with sampling.