Sampling
The graph-based machine learning community has developed various strategies for training and using models on large graphs. One of the most common approaches is training on sampled subgraphs (aka communities) pseudorandomly sampled from the larger graph.
In general, the specifics of the sampling strategy depend on the graph and the intended application. This is even more pronounced for the Bitcoin Graph, given its scale, the heterogeneity of its nodes and edges, and its extensive longitudinal nature, as it spans over a decade.
We provide customizable community sampling strategies that can be adjusted to fit the data requirements of a wide spectrum of application areas. The strategies we currently provide are:
Additionally, we provide the following ready-to-use sampled communities:
-
200kcommunities sampled using Forest Fire method containing only Script-to-Script edges.
Sample Your Own Communities
-
Restore the graph database if needed.
-
Ensure the database is running and you're connected to it.
-
Run the
samplemethod; you may run the following command for a documentation on the command's arguments..\eba.exe bitcoin sample --help -
You may refer to the following documentation on arguments specific to each sampling algorithm:
In addition to passing arguments via the command line, you can supply them using a JSON file. For instance:
{
"Bitcoin": {
"GraphSample": {
"Count": 100,
"MinNodeCount": 500,
"MaxNodeCount": 1000,
"MinEdgeCount": 499,
"MaxEdgeCount": 10000,
"RootNodeSelectProb": 0.3,
"TraversalAlgorithm": 0,
"ForestFireOptions": {
"maxHops": 6,
"queryLimit": 1000,
"reductionFactor": 4,
"nodeCountAtRoot": 100
}
}
}
}
You may then run the tool using the JSON file as the following.
.\eba.exe bitcoin sample --status-filename .\my_options.json