Forest-Fire
Motivation:
Consider a node with an out-degree of 100,
where each of its neighbors also has a similar out-degree.
If the goal is to sample 10 nodes from the neighborhood within 3 hops,
a breadth-first search (BFS) traversal will likely return 10 direct neighbors of the root node,
and a depth-first search (DFS) search will likely return the first neighbor found and 9 of that neighbor's neighbors.
Therefore, neither of these methods will effectively sample nodes up to 3 hops away,
nor will the sampled subgraph be representative of the broader 3-hop neighborhood structure.
Therefore, to sample representative neighborhoods in the Bitcoin graph, we developed a method that randomly selects a subset of neighbors for a root node. Then, for each of those selected nodes, it chooses a subset of their immediate neighbors, and continues this process until a termination criterion is met.
You may set the sample method to use the forest-fire algorithm
and adjust the algorithm's parameters as follows.
Example:
.\eba.exe bitcoin sample `
--count 1000 `
--method ffs `
--method-options '{\"maxHops\": 6, \"nodeCountAtRoot\": 10, \"reductionFactor\": 2, \"queryLimit\": 1000}'
This command sets the method to sample,
aiming for 100 communities using the ffs (forest-fire sampling) strategy
at a maximum depth of 6 hops from the root nodes.
In addition to passing arguments via the command line, you can supply them using a JSON file. See this page for an example.
You may run the following command to view documentation for common arguments of the sample command:
.\eba.exe bitcoin sample --help
FFS-specific arguments are documented below.
| Option | Type | Default | Description |
|---|---|---|---|
maxHops | int | 2 | Maximum traversal depth (hops) away from the root node. |
queryLimit | int | 1000 | Maximum number of neighbors to query from the database before sampling. |
nodeCountAtRoot | int | 100 | Maximum number of neighbors to sample from the root node. |
reductionFactor | double | 4.0 | Factor controlling the reduction of sampled neighbors as hop depth increases. |
maxHops​
Sets the maximum number of hops away from the root node that the algorithm is allowed to traverse.
queryLimit​
Sets the maximum number of neighbors to query from the database for a given node. These queried nodes become the candidate pool for the forest-fire algorithm (i.e., the algorithm will select a subset from these).
nodeCountAtRoot​
Sets the maximum number of neighbors to sample
from the root node's queried set
(the size of which is limited by queryLimit).
reductionFactor​
As the algorithm traverses deeper from the root node, it randomly selects fewer neighbors. This reduction is controlled by the reduction factor. The number of nodes sampled at a specific hop is calculated as:
floor(nodeCountAtRoot - (hop * reductionFactor))