Blockchain: AI-Powered Behavioural Address Clustering

Overview#

A financial intelligence analyst investigating a suspected sanctions-evasion network discovers that the subject controls dozens of wallet addresses spread across Bitcoin and Ethereum. Rather than tracing each address in isolation, the analyst submits a seed address and receives back a ranked cluster of associated addresses, confidence scores for each association, and a timeline showing how the cluster evolved as new wallets were created. Within minutes, the investigator has a coherent picture of the entity's on-chain footprint without needing to perform exhaustive manual graph traversal.

This capability applies machine learning to the inherent structure of blockchain transaction graphs, grouping addresses that share behavioural fingerprints into entity clusters. On UTXO-based chains such as Bitcoin, common-input-ownership heuristics identify addresses co-signed in the same transaction. On account-based chains such as Ethereum, gas-payment patterns, contract interaction sequences, and timing correlations reveal linked wallets. Confidence scores are computed and continuously refreshed as new on-chain data arrives, allowing clusters to grow, merge, or split as the evidence base changes.

Key Features#

UTXO common-input heuristic: Automatically groups Bitcoin and similar UTXO-chain addresses that appear together as inputs in the same transaction, a strong indicator of shared key custody.
EVM behavioural profiling: Analyses gas-payment sources, contract call sequences, token approval patterns, and inter-wallet timing on Ethereum and EVM-compatible chains to surface linked accounts.
Confidence scoring: Every address-to-cluster association carries a numerical confidence score derived from the number and quality of supporting heuristic signals, giving analysts a clear basis for prioritising leads.
Dynamic cluster evolution: Clusters update automatically as new blocks are ingested, merging previously separate clusters when new linking evidence emerges and splitting them when contradictory signals appear.
Multi-chain entity resolution: A single entity record can span Bitcoin, Ethereum, and additional supported chains, presenting a unified cross-chain view of the subject's on-chain activity.
Investigator feedback loop: Analysts can accept, reject, or annotate individual address associations, and these judgements are fed back into the model to improve future clustering precision for the organisation.
Exchange and mixer detection: Identifies known exchange deposit addresses and mixing service patterns, allowing analysts to distinguish controlled consolidation behaviour from deliberate obfuscation.
Provenance and audit trail: Every clustering decision records the heuristics applied, the model version used, and the analyst actions taken, supporting evidential continuity for legal proceedings.

Use Cases#

Financial crime investigation: Investigators following cryptocurrency flows linked to fraud, ransomware, or sanctions evasion can rapidly resolve pseudonymous addresses to a manageable set of suspected entities rather than tracking each address independently.
Exchange compliance screening: Virtual asset service providers can screen incoming and outgoing transactions against clusters associated with high-risk entities, improving the coverage and accuracy of their anti-money-laundering controls.
Sybil and wash-trading detection: Platforms and regulators can identify accounts that appear independent but share an on-chain controller, revealing coordinated manipulation of decentralised markets or governance mechanisms.
Dark-web marketplace attribution: Law enforcement analysts tracing payments to or from dark-web infrastructure can use behavioural clustering to link multiple vendor wallets and payment addresses back to a smaller number of real operators.
Sanctions enforcement support: Compliance teams and government agencies can maintain and continuously refresh clusters tied to sanctioned entities, alerting in near-real time when new addresses are linked to a known cluster.

Integration#

The clustering capability receives raw transaction data via an Apache Spark-based ingestion pipeline that normalises UTXO and account-model data into a common graph representation stored in a platform record store time-series partition and replicated into graph analysis layer for graph traversal. The machine learning models are trained and served through a standard ML pipeline framework, and their outputs are written back to the same graph store so that downstream case-management, alerting, and reporting modules can query cluster membership and confidence scores through the platform's typed integration layer without needing direct awareness of the underlying chain data model.

Open Standards#

FATF Recommendation 16 (Travel Rule): Clustering results are surfaced alongside Travel Rule originator and beneficiary data, enabling virtual asset service providers to cross-reference counterparty wallet attribution with regulatory transfer records.
ISO/IEC 27001: All cluster data, including investigator annotations and provenance records, is stored and processed under an information security management framework aligned with ISO/IEC 27001 controls for confidentiality, integrity, and availability.
W3C Decentralised Identifiers (DID) v1.0: Entity records resolved through clustering can be exported as W3C DID documents, enabling interoperability with identity verification and credential frameworks that consume the DID specification.
GDPR (Regulation 2016/679): Personal data inferred through entity resolution is handled under GDPR principles of purpose limitation and data minimisation, with access controls and audit logging that support subject-access and erasure workflows.
NIST SP 800-53 (Audit and Accountability): The full provenance trail for every clustering decision, model inference, and analyst action is retained in append-only audit logs consistent with the audit and accountability control family.
OpenAPI 3.1: Cluster query, submission, and feedback endpoints are described in an OpenAPI 3.1 specification, enabling partner systems and third-party tooling to integrate without bespoke connector development.
OAuth 2.0 and JWT Bearer Token: Token-based authentication protects typed, auditable read and write workflows across the platform.

Last Reviewed: 2026-05-26