Sampling
The graph-based machine learning community has developed various strategies for training and using models on large graphs. One of the most common approaches is training on sampled subgraphs (aka communities) pseudorandomly sampled from the larger graph.
In general, the specifics of the sampling strategy depend on the graph and the intended application. This is even more pronounced for the Bitcoin Graph, given its scale, the heterogeneity of its nodes and edges, and its extensive longitudinal nature, as it spans over a decade.
We provide customizable community sampling strategies that can be adjusted to fit the data requirements of a wide spectrum of application areas. The strategies we currently provide are:
Additionally, we provide the following ready-to-use sampled communities:
-
200k
communities sampled using Forest Fire method containing only Script-to-Script edges.
Sample Your Own Communities
-
Setup a Neo4j database
-
Access a Neo4j solution, e.g., a local installation
-
Download Bitcoin Graph: You may download the graph in two formats:
- Neo4j Database Dump, which is the faster option as it bypasses the need for a bulk import (see load Neo4j database dump).
- Neo4j format TSV files, which is more flexible but requires a computationally demanding bulk import that can take 2-3 weeks to complete (see import TSV into Neo4j database).
-
Start the Neo4j database (e.g., via Neo4j Desktop).
-
-
Run the sampling method; you may run the following command for a documentation on the command's arguments.
.\eba.exe bitcoin sample --help