Temporal Knowledge Graph Embedding

Overview#

A graph that records relationships between entities gives analysts a useful map of who is connected to whom. What it does not capture is when those connections were active. A contact relationship that existed three years ago carries very different intelligence weight from one established last week. Temporal knowledge graph embeddings address this by learning a numerical representation of each entity and relationship that is conditioned on time, so the model encodes not just the structure of the network but the rhythm at which it changes.

The Temporal Knowledge Graph Embedding module trains a TNTComplEx model using PyKEEN on the relationship graph store. Each relationship record is treated as a timestamped triple: a source entity, a relationship type, and a target entity occurring at a specific point in time. TNTComplEx decomposes these temporal tensors into learned entity and relation embeddings, enabling the model to answer questions like which entities are most likely to become connected via specific relation type at a given future time. This supports anticipatory targeting, missing-link inference, and temporal anomaly detection in the investigation graph.

Last Reviewed: 2026-04-14 Last Updated: 2026-04-14

Key Features#

TNTComplEx Temporal Tensor Decomposition: The model factorises the temporal knowledge graph as a fourth-order tensor over entities, relations, and time. It learns separate real and imaginary embedding components for each dimension, capturing the complex rotation structure of relational patterns as they evolve over time.
PyKEEN Pipeline Training: Training is managed by the PyKEEN pipeline, which handles batching, negative sampling, and evaluation automatically. The model trains for 100 epochs on the organisation's relationship graph data. Evaluation on a held-out 20% split produces MRR and Hits@10 scores that are stored with the model record.
Three-Tier Model Registry: Trained models are serialised and stored in secure object storage under a per-organisation key. The model cache serves subsequent inference requests from a memory cache first, then a local disk cache, then R2. This keeps inference latency under 50ms after the first load without bundling large model files into the service deployment.
Link Prediction: Given a head entity, a relation type, and a timestamp, the model scores all candidate tail entities and returns the top-k most plausible. This supports missing-link inference in investigations, suggesting connections that the current collected data may not yet confirm.
Triple Scoring: Analysts can score any specific temporal triple to assess how consistent it is with the learned patterns of the organisation's graph. A score near 1.0 indicates the triple fits the temporal structure well; a score near 0.0 suggests it is anomalous or implausible given the known graph history.
Organisation-Scoped Models: Each organisation trains and stores its own model. Training data, model artefacts, and inference results are strictly isolated to the requesting organisation. Cross-tenant inference is not permitted.
Graceful Degradation: If PyKEEN or PyTorch are not installed in the deployment environment, all service methods return empty results with a warning log rather than raising errors. This allows the service to be deployed in constrained environments without breaking dependent workflows.

Use Cases#

Anticipatory Link Inference: Predict which entities are likely to establish a given type of connection in the near future based on the learned temporal structure of the graph.
Missing Data Detection: Score known triples that should exist based on domain knowledge but are absent from the graph, prioritising collection gaps for intelligence teams.
Temporal Anomaly Detection: Score claimed relationships against the model to identify temporal inconsistencies, such as a relationship asserted to have occurred at a time inconsistent with the entity's known activity pattern.
Emerging Threat Actor Mapping: Identify entities whose predicted connection patterns suggest they are moving into new operational roles or networks before that movement is confirmed by collected data.
Baseline Pattern Learning: Establish an organisation-specific baseline of normal temporal relationship patterns so that deviations from that baseline can be surfaced for analyst attention.

Integration#

Graph Intelligence: Temporal KGE complements static graph traversal and centrality analysis with time-aware predictions.
Investigation Workflow: Predicted links and triple scores can be surfaced in the investigation detail view to guide analyst collection and hypothesis testing.
HITL Approval: High-confidence predicted links can trigger HITL review requests for analyst validation before being elevated to confirmed intelligence.
Hypergraph Analysis: Entity embeddings from TNTComplEx can be combined with s-centrality scores from hypergraph analysis for multi-dimensional entity importance ranking.
Audit Trail: All training runs and inference queries are logged with organisation ID, timestamp, and query parameters to satisfy EDF/PESCO compliance requirements.

Open Standards#

Resource Description Framework (RDF): W3C standard data model for representing knowledge graphs as subject-predicate-object triples, underpinning the entity-relation-entity structure on which temporal embeddings are trained.
RDF Schema (RDFS): W3C vocabulary description language used to define classes and properties in the knowledge graph, enabling consistent entity typing and relation semantics across organisational graphs.
Web Ontology Language (OWL 2): W3C standard for formal ontology representation, providing the logical foundation for relationship types and entity hierarchies modelled in the investigation graph.
SPARQL 1.1: W3C query language for RDF graphs, used to extract timestamped triple sets from the knowledge graph store prior to training and to retrieve predicted links for analyst review.
ISO 8601: International standard for representing dates and times, used to encode the temporal dimension of each triple so that time values are unambiguous and interoperable across systems.
JSON-LD 1.1: W3C serialisation format for linked data in JSON, used to export trained model metadata and inference results in a machine-readable, standards-compliant envelope.