Forest-Fire
Motivation:
Consider a node with an out-degree of 100,
where each of its neighbors also has a similar out-degree.
If the goal is to sample 10 nodes from the neighborhood within 3 hops,
a breadth-first search (BFS) traversal will likely return 10 direct neighbors of the root node,
and a depth-first search (DFS) search will likely return the first neighbor found and 9 of that neighbor's neighbors.
Therefore, neither of these methods will effectively sample nodes up to 3 hops away,
nor will the sampled subgraph be representative of the broader 3-hop neighborhood structure.
Therefore, to sample representative neighborhoods in the Bitcoin graph, we developed a method that randomly selects a subset of neighbors for a root node. Then, for each of those selected nodes, it chooses a subset of their immediate neighbors, and continues this process until a termination criterion is met.
You may set the sample method to use the forest-fire algorithm
and adjust the algorithm's parameters as follows.
Example:
.\eba.exe bitcoin sample `
--count 1000 `
--method ffs `
--method-options '{\"maxHops\": 6, \"nodeCountAtRoot\": 10, \"reductionFactor\": 2, \"queryLimit\": 1000}'
This command sets the method to sample,
aiming for 100 communities using the ffs (forest-fire sampling) strategy
at a maximum depth of 6 hops from the root nodes.
In addition to passing arguments via the command line, you can supply them using a JSON file. See this page for an example.
You may run the following command to view documentation for common arguments of the sample command:
.\eba.exe bitcoin sample --help
FFS-specific arguments are documented below.
| Option | Type | Default | Description |
|---|---|---|---|
maxHops | int | 2 | Maximum traversal depth (hops) away from the root node. |
queryLimit | int | 1000 | Maximum number of neighbors to query from the database before sampling. |
nodeCountAtRoot | int | 100 | Maximum number of neighbors to sample from the root node. |
reductionFactor | double | 4.0 | Factor controlling the reduction of sampled neighbors as hop depth increases. |
maxHops
Sets the maximum number of hops away from the root node that the algorithm is allowed to traverse.
queryLimit
Sets the maximum number of neighbors to query from the database for a given node. These queried nodes become the candidate pool for the forest-fire algorithm (i.e., the algorithm will select a subset from these).
nodeCountAtRoot
Sets the maximum number of neighbors to sample
from the root node's queried set
(the size of which is limited by queryLimit).
reductionFactor
As the algorithm traverses deeper from the root node, it randomly selects fewer neighbors. This reduction is controlled by the reduction factor. The number of nodes sampled at a specific hop is calculated as:
floor(nodeCountAtRoot - (hop * reductionFactor))