Skip to main content

Restore database dump

On this page, we walk through the steps to populate an empty Neo4j database using our database dump. This approach allows you to skip the resource-intensive bulk import process and start querying the full graph significantly faster.

The process involves:

  • Downloading a multi-part archive of the database dump.
  • Extracting the archive to a local directory.
  • Loading the dump into your Neo4j instance.
Do I need to host a graph database?

Yes, if you want to sample application-specific communities or explore the graph interactively (e.g., querying nn-hop neighborhoods).

No, if you want a quick start for developing models using our generic, pre-sampled communities. In this case, you can jump straight to the g101 Jupyter Notebook or these quick-start examples.

Resource Requirements

Bandwidth: This process involves downloading nearly 1 TB of data; ensure you are using a stable connection without data caps.

Storage: Ensure you have at least 4.3 TB of free disk space (compressed download: ~800 GB, extracted database dump: ~800 GB, and populated Neo4j database: ~2.7 TB).

Prerequisites & setup

  1. Install neo4j graph database.

  2. Install data source CLI: install the AWS CLI.

  3. Install 7-Zip.

    sudo apt update && sudo apt install p7zip-full -y

Download & extract archive

The database dump is compressed and split into many chunks (1070 chunks, 700 MB each in data release v1) to ensure reliable downloading.

  1. Configure environment variables to specify the target directories for downloading and extracting the data.

    # Set the download path for the multi-part archive. 
    # Requires ~800 GB free space.
    export G_DOWNLOAD_PATH="/mnt/download/path"

    # Set the extraction path for the archive.
    # Requires ~800 GB free space.
    export G_EXTRACT_PATH="/mnt/extract/path"
  2. Download the database dump files.

    aws s3 sync s3://bitcoin-graph/v1/neo4j_db_dump/ "${G_DOWNLOAD_PATH}" --no-sign-request
  3. Extract the downloaded multi-part archive.

    7z x "${G_DOWNLOAD_PATH}/neo4j.dump.gz.001" -o"${G_EXTRACT_PATH}"

    By targeting the .001 file, 7-Zip will automatically detect and process the remaining parts in the sequence.

    Note that decompressing ~700 GB of data is a heavy operation, and it will take several hours depending on your disk speed.

Restore database dump

  1. Stop the database service.

    sudo systemctl stop neo4j
  2. Restore the database.

    sudo -u neo4j neo4j-admin database load neo4j \
    --from-path="${G_EXTRACT_PATH}" \
    --overwrite-destination \
    --verbose

    Please refer to this page for documentation on the database load command.

    Note: This step will take a significant amount of time (e.g., 24h) and requires at least 2.72 TB of free space in the Neo4j database path.

  3. Start the database service.

    sudo systemctl start neo4j
  4. Enable APOC.