Hyperbolic Network Embedding

Overview#

When you map a criminal network in two-dimensional space, something important gets lost. The top of a drug supply chain has one or two leaders who control dozens of couriers. In a flat Euclidean map, placing those dozens of couriers around the two leaders requires a lot of space, and the resulting picture distorts how far removed a street-level courier really is from the leadership. The hierarchy flattens out, and the structural distances between nodes become misleading.

Hyperbolic space solves this. In the Poincaré ball model of hyperbolic geometry, the amount of available space grows exponentially as you move outward from the centre. A tree with many branches and leaves fits naturally, because the geometry itself mirrors the branching structure of a hierarchy. Leaders sit near the origin. Peripheral members cluster near the boundary. The Poincaré distance between two entities reflects genuine hierarchical separation, not just co-occurrence counts.

The Hyperbolic Network Embedding module trains Poincaré ball embeddings over an investigation's entity graph using geoopt's RiemannianAdam optimiser. Once trained, analysts can query which entities are closest in Poincaré space to any given node, retrieve the hierarchy depth score for any entity, and compute pairwise Poincaré distances across a set of entities of interest.

Last Reviewed: 2026-04-14 Last Updated: 2026-04-14

Key Features#

Poincaré Ball Geometry: Embeddings are trained on the Poincaré ball manifold with curvature c = 1.0. The manifold structure is enforced throughout training by geoopt's RiemannianAdam optimiser, which respects the hyperbolic geometry rather than naively updating coordinates in Euclidean space.
Hierarchy Depth Scoring: Each entity's position in the Poincaré ball naturally encodes its role in the hierarchy. The Euclidean norm of the embedding vector serves as a depth proxy: entities with a norm close to zero are central hub nodes; entities with a norm close to one are peripheral leaf nodes. No explicit hierarchy labels are needed.
Nearest Neighbours in Poincaré Space: Given any entity in an investigation, the module returns the top-k nearest entities ranked by Poincaré distance. Because hyperbolic distance reflects hierarchical separation rather than flat co-occurrence, the neighbours returned are hierarchically meaningful, not just structurally adjacent in the original graph.
Pairwise Distance Matrix: For a specified list of entities, the module computes the full pairwise Poincaré distance matrix. This supports downstream clustering, role segmentation, and command structure analysis across a selected subset of network members.
Model Persistence via R2: Trained embedding artefacts (torch state dicts) are stored in object storage through a three-tier model cache (memory, local disk, and object storage). Subsequent inference calls load the model lazily without retraining.
Graceful Degradation: If the torch or geoopt libraries are not installed in the deployment environment, all methods return null or empty results with a warning logged. No errors are raised. This allows the module to be deployed to environments where GPU or large ML dependencies are not available without breaking the rest of the platform.

Use Cases#

Identifying the Hierarchy Leader: Run Poincaré ball training on a criminal network investigation, then sort all entities by hierarchy depth score ascending. The entity with the lowest norm is the most central node in the network topology, typically the organiser or financier rather than the operational members.
Finding Peripheral Members: Entities near the boundary of the Poincaré ball (norm close to 1) are the peripheral, lowest-level participants. In a drug network these are couriers or end users. In a financial crime pyramid these are the unwitting account holders whose accounts were used for layering. Identifying them quickly can inform witness strategy or asset recovery prioritisation.
Understanding Command Structure Without Labels: Standard graph centrality measures can identify well-connected nodes, but they do not distinguish between a well-connected hub that organises others and a well-connected broker who simply appears in many transactions. Hyperbolic embeddings learn the directional structure of the hierarchy from the graph topology, so the depth score reflects command-and-control position rather than just degree count.
Cross-Investigation Comparison: By embedding two investigations with the same dimensionality and comparing the Poincaré distance distributions, analysts can assess whether two networks share structural similarities, which may indicate common organisational patterns or leadership overlap.

Why Not Euclidean Embeddings#

Standard Euclidean embeddings trained on the same graph data distribute entities in flat space. For tree-like structures, this requires exponentially more dimensions to represent the hierarchy with low distortion. A 64-dimensional Poincaré ball embedding can capture the same structural information as a much higher-dimensional Euclidean embedding because the geometry naturally accommodates branching. This is not a performance optimisation: it is a structural alignment between the data and the representation space.

Integration#

Graph Relationships: Reads entity pairs from the graph relationships store scoped to an investigation and organisation.
Model Registry: Persists and retrieves trained model artefacts via the three-tier model cache backed by object storage.
Temporal Knowledge Graph Embeddings: Complements the TNTComplEx temporal KGE module; Poincaré embeddings capture hierarchy structure while temporal KGE captures relationship validity over time.
Structural Scoring: Works alongside the graph structural score module to provide both topology-based scoring and hierarchy-aware positioning.

Open Standards#

Resource Description Framework (RDF 1.1): W3C standard for representing graph data as subject-predicate-object triples, providing the foundational data model for entity relationship graphs used as embedding inputs.
SPARQL 1.1: W3C query language for RDF graphs, used to extract entity pairs and relationship data from knowledge graph stores prior to embedding training.
STIX 2.1 (Structured Threat Information Expression): OASIS standard for representing threat actors, relationships, and network structures in criminal and threat intelligence contexts, aligning with the entity graph inputs this module consumes.
Property Graph Model (openCypher): Open specification for labelled property graphs defining nodes, edges, and property semantics; the graph relationship structure this module trains over conforms to this model.
ISO 8601: International standard for date and time representation, used when recording embedding artefact creation and expiry metadata.