Provenance Tracking

Overview#

When a profile emerges from the fusion of ten source records, an analyst applying it to an operation has an immediate question: where did this come from, and can I trust it? That question is not rhetorical, in a multi-national intelligence context it has legal, operational, and liability dimensions. Without first-class provenance, the answer is locked in database timestamps and log files that no analyst can navigate during a time-critical mission.

The Provenance Tracking module models data origin and lineage as native graph nodes using the W3C PROV-DM standard (W3C Recommendation, 30 April 2013). Every entity in the platform, profiles, indicators, merged records, ingested documents, carries an explicit provenance chain: who created it, through which pipeline activity, from which source entities, and at what time. Analysts can traverse that chain to the original source in a single query. The chain serialises to W3C PROV-JSON for export, external audit, and interoperability with allied systems.

Last Reviewed: 2026-04-14 Last Updated: 2026-04-14

Key Features#

W3C PROV-DM Graph Nodes: The three core W3C PROV-DM concepts, prov:Entity, prov:Activity, and prov:Agent, are modelled as first-class graph analysis layer nodes with typed relationship labels (PROV_WAS_GENERATED_BY, PROV_WAS_ATTRIBUTED_TO, PROV_WAS_ASSOCIATED_WITH, PROV_WAS_DERIVED_FROM). Provenance is queryable as a graph traversal, not reconstructed from flat log tables.
platform record store Source of Truth: Every provenance event is written to platform record store first as an a dedicated provenance record. graph analysis layer holds the graph replica for traversal performance. If graph analysis layer is unavailable, the platform falls back to a platform record store-based chain reconstruction, preserving provenance continuity with no data loss.
Ingestion Pipeline Instrumentation: Every normalisation job submitted through the ingestion pipeline automatically records a provenance event with activity type ingestion. The agent is the submitting user or service, and the event is scoped to the submitting organisation.
Entity Merge Provenance: When the entity resolution service merges two or more candidate entities, a provenance record with activity type entity_merge is created on the resulting canonical entity. The merged source entity IDs are stored as wasDerivedFrom links, giving a complete audit trail of how a profile was constructed from multiple originating records.
Provenance Chain Traversal: The the provenance chain view governed read workflow traverses PROV_WAS_DERIVED_FROM and PROV_WAS_GENERATED_BY relationships back to the origin at configurable depth (default 5 hops). The result includes all nodes and directed edges in the provenance subgraph.
W3C PROV-JSON Export: The the provenance export workflow governed read workflow returns a W3C PROV-JSON serialisation conforming to https://www.w3.org/TR/prov-json/. The document includes prefixes, entity, activity, agent, wasGeneratedBy, wasAttributedTo, wasAssociatedWith, and wasDerivedFrom collections. This format is readable by any standards-compliant provenance tool and suitable for submission to external audit or allied national systems.
Organisation-Scoped Isolation: All provenance queries, both platform record store and graph analysis layer, include organisation scope in every WHERE clause. Provenance chains from one tenant are never visible to another. This is enforced at the database and graph layer, not only at the API layer.
Provenance Chain Viewer: The provenance chain viewer renders the provenance chain as an inline expandable panel within entity detail views. Entities are shown as blue circles, activities as green rectangles, and agents as orange person icons. The panel includes a one-click PROV-JSON download for offline audit.

Use Cases#

Source Attribution: Analysts confirm which data source contributed to a profile before applying it to an active operation, reducing the risk of acting on stale or low-confidence data.
Legal Discovery: When an investigation faces legal challenge, the provenance chain provides a machine-readable, court-admissible record of exactly how each entity was constructed.
Merge Audit Trail: Supervisors review which records were merged to form a canonical profile, and by whom, supporting GDPR right-to-rectification workflows.
Intelligence Sharing: Exporting PROV-JSON alongside shared intelligence artefacts lets allied organisations verify the origin of received data without requiring access to the Argus system.
Compliance Reporting: EDF/PESCO compliance auditors access the provenance chain to confirm that all data modifications are fully attributed to authenticated users or named services.

Integration#

Entity Resolution: The merge provenance hook instruments every entity-merge operation without breaking existing entity resolution functionality.
Ingestion Pipeline: The ingestion provenance hook instruments every ingestion submission call. The provenance service is injected as an optional dependency, so existing callers that do not supply it are unaffected.
graph analysis layer Graph: Provenance nodes (ProvEntity, ProvActivity, ProvAgent) live alongside the existing intelligence graph. Graph traversal queries can join provenance context directly into investigative queries.
Audit Trail: Works alongside the existing hash-chained audit logging service. The audit trail records what happened; the provenance graph records why and from what.
Intelligence Profiles: The provenance chain viewer integrates as a collapsible panel in entity detail views, presenting provenance context in the same workflow where analysts use the entity.

Open Standards#

W3C PROV-DM (W3C Recommendation, 30 April 2013): The data model implemented by this module; defines the core concepts of Entity, Activity, and Agent together with five provenance relationships.
W3C PROV-JSON (W3C Working Group Note, 30 April 2013): The serialisation format used for provenance export and interoperability with allied systems and audit tools.
W3C PROV-O (W3C Recommendation, 30 April 2013): The OWL ontology underpinning the provenance data model, enabling semantic web interoperability where required.
W3C JSON-LD 1.1 (W3C Recommendation, 16 July 2020): The linked-data framing used when serialising provenance graphs for export, allowing consuming systems to resolve term definitions unambiguously.