Quick Start
For a quick start,
we train and evaluate a model to generate node embeddings for
a Bitcoin script based on its 3-hop neighborhood.
We experiment with an unsupervised contrastive learning model and
then use the resulting embeddings to cluster the nodes.
We then compare these clusters with external annotations that
identify the wallet to which a script belongs and
classify wallets as belonging to exchanges,
mining pools, or gambling services.
For model development, we use this dataset,
which contains 200k
randomly selected script nodes with
their neighborhoods sampled using the Forest-Fire method.
More details on the model and instructions for this quick start
are available at this page.
As the following diagram illustrates, this quick start bypasses the ETL pipeline to focus directly on the machine learning application. We skip the ETL pipeline because it requires weeks of processing and significant computational resources, which is beyond the scope of a quick start. The complete ETL pipeline is documented in this section.