Import into neo4j
Yes, if you need to modify the graph structure or append new data that is not included in our release. Note: This is highly resource-intensive and can take 2-3 weeks on a high-end desktop computer.
No, if you simply want to explore the graph or sample communities from the dataset. In this case, restore database dump instead; it bypasses the weeks-long processing time required for the bulk import described on this page.
On this page, we walk through the steps required to import the Bitcoin graph from batched TSV files into a Neo4j database.
Checkpoint: using pre-generated graph data
If you chose to skip the
sync a Bitcoin node
traverse steps, you do not have the TSV files yet.
Instead, you can download the data we have prepared,
which encompasses all blocks up to height 863000.
Note: If you did run the sync a Bitcoin node and traverse steps and generated your own TSV files, you can skip this step and proceed directly to the import step below.
This process involves downloading nearly 1.2 TB of data;
ensure you are using a stable connection without data caps and
have at least 1.2 TB of free disk space.
You may take the following steps to download the graph in TSV files.
-
Configure environment variable to specify the target directory.
export GDIR="/mnt/download/path" -
Download the TSV files.
aws s3 sync s3://bitcoin-graph/v1/data_to_import_neo4j/ "${GDIR}" --no-sign-request
Import into the database
neo4j-admin
offers the highest throughput method for populating a massive database.
Its primary constraint is that it requires an empty database,
meaning it does not support incremental updates to an existing graph.
-
Install neo4j graph database.
-
If you are using an existing database instance, ensure the target database is empty and shut down.
sudo systemctl stop neo4j -
Set an environment variable pointing to the directory containing the Bitcoin graph TSV files.
export GDIR="<set to the dir that contains graph data>"Verify that your data directory contains the correctly batched files. You can use the following command to inspect the file distribution by type (ignoring timestamp prefixes):
ls -1 | sed -E "s/^[0-9]+_/[Timestamp]_/" | sort | uniq -cYou should see header files (e.g.,
BitcoinGraph_header.tsv.gz), batched edge files (e.g.,195 [Timestamp]_BitcoinS2S.tsv.gz), and the unique node files.1 BitcoinB2S_header.tsv.gz
1 BitcoinB2T_header.tsv.gz
1 BitcoinC2S_header.tsv.gz
1 BitcoinC2T_header.tsv.gz
1 BitcoinCoinbase.tsv.gz
1 BitcoinGraph_header.tsv.gz
1 BitcoinS2B_header.tsv.gz
1 BitcoinS2S_header.tsv.gz
1 BitcoinScriptNode_header.tsv.gz
1 BitcoinT2B_header.tsv.gz
1 BitcoinT2T_header.tsv.gz
1 BitcoinTxNode_header.tsv.gz
195 [Timestamp]_BitcoinC2S.tsv.gz
195 [Timestamp]_BitcoinC2T.tsv.gz
1 [Timestamp]_BitcoinGraph.tsv.gz
195 [Timestamp]_BitcoinS2S.tsv.gz
195 [Timestamp]_BitcoinT2T.tsv.gz
195 [Timestamp]_byC2S_BitcoinB2S.tsv.gz
195 [Timestamp]_byC2T_BitcoinB2T.tsv.gz
195 [Timestamp]_byS2S_BitcoinB2S.tsv.gz
195 [Timestamp]_byS2S_BitcoinS2B.tsv.gz
195 [Timestamp]_byT2T_BitcoinB2T.tsv.gz
195 [Timestamp]_byT2T_BitcoinT2B.tsv.gz
1 unique_BitcoinScriptNode.tsv.gz
1 unique_BitcoinTxNode.tsv.gz -
Determine the optimal heap size for the import process. For a graph of this magnitude, memory configuration is critical for performance. Please refer to Neo4j Memory Configuration Guide.
-
Execute the import command. Note that we use regex patterns (e.g., .
*BitcoinS2S.tsv.gz) to ingest the batched edge files automatically.sudo -u neo4j HEAP_SIZE=4G neo4j-admin database import full \
--overwrite-destination neo4j \
--nodes="$GDIR/BitcoinCoinbase.tsv.gz" \
--nodes="$GDIR/BitcoinGraph_header.tsv.gz,$GDIR/0_BitcoinGraph.tsv.gz" \
--nodes="$GDIR/BitcoinScriptNode_header.tsv.gz,$GDIR/unique_BitcoinScriptNode.tsv.gz" \
--nodes="$GDIR/BitcoinTxNode_header.tsv.gz,$GDIR/unique_BitcoinTxNode.tsv.gz" \
--relationships="$GDIR/BitcoinS2S_header.tsv.gz,$GDIR/.*BitcoinS2S.tsv.gz" \
--relationships="$GDIR/BitcoinT2T_header.tsv.gz,$GDIR/.*BitcoinT2T.tsv.gz" \
--relationships="$GDIR/BitcoinC2T_header.tsv.gz,$GDIR/.*BitcoinC2T.tsv.gz" \
--relationships="$GDIR/BitcoinC2S_header.tsv.gz,$GDIR/.*BitcoinC2S.tsv.gz" \
--relationships="$GDIR/BitcoinB2T_header.tsv.gz,$GDIR/.*BitcoinB2T.tsv.gz" \
--relationships="$GDIR/BitcoinB2S_header.tsv.gz,$GDIR/.*BitcoinB2S.tsv.gz" \
--relationships="$GDIR/BitcoinS2B_header.tsv.gz,$GDIR/.*BitcoinS2B.tsv.gz" \
--relationships="$GDIR/BitcoinT2B_header.tsv.gz,$GDIR/.*BitcoinT2B.tsv.gz" \
--delimiter="\t" \
--array-delimiter=";" \
--verbose \
--skip-duplicate-nodes -
Once the import concludes, restart the Neo4j service. We also recommend installing the APOC library, as it is needed in EBA for sampling communities.