Ingestion Domain

Overview#

A financial crime unit needs to import three years of blockchain transaction data from an on-chain analytics provider, plus a batch of 50,000 records from a national vehicle registration database. Both imports run as background jobs: they can take minutes or hours, may fail partway through, and need to be retried automatically without losing progress. The Ingestion domain manages these jobs from submission through completion, tracking their status, capturing errors when they occur, and retrying failed jobs with exponential backoff to avoid overwhelming source systems.

Data ingestion is rarely a one-time event. New records arrive from connected systems on schedule, analysts upload evidence files for processing, and streaming sources push data continuously. The Ingestion domain provides a consistent job management layer across all of these patterns.

Key Features#

Ingestion job creation with source identification and payload configuration
Job status tracking through lifecycle states: pending, running, completed, failed, and cancelled
Automatic retry logic with configurable maximum attempts and exponential backoff
Error message capture for failed jobs to support diagnosis and manual intervention
Multi-tenant job isolation per organisation
Support for multiple source types: blockchain ETL, CSV import, API sync, streaming, and file upload

Use Cases#

Importing blockchain transaction data for financial investigations involving cryptocurrency flows
Bulk importing CSV data files from external registries or partner organisations into the platform
Synchronising data from third-party APIs on a schedule to keep investigation-relevant datasets current
Processing user-uploaded evidence files through ingestion pipelines for entity extraction and classification

Industry Context#

Financial intelligence units ingest transaction data from banking system exports, blockchain analytics services, and cross-border payment networks. National police services bulk import records from vehicle licensing authorities, company registries, and immigration systems. Defence intelligence teams ingest structured threat data from partner nation feeds on scheduled cycles. Utilities ingest telemetry and alarm data from SCADA systems for operational analysis. Courts and prosecution services import case management exports from legacy systems during platform migrations.

Integration#

The Ingestion domain integrates with Ingestion Pipeline for pipeline definitions, Data Source for configuration, Connector for data connectors, Transform for data transformation, and Monitor for job monitoring. Job records are stored in platform record store with organisation-scoped isolation.

Open Standards#

OAuth 2.0 and JWT Bearer Token: Token-based authentication protects typed, auditable read and write workflows across the platform.
ISO 8601: All job timestamps (created_at, updated_at, error timestamps) are stored and exchanged as ISO 8601 date-time strings, and the schema mapper coerces incoming timestamp fields from multiple formats into ISO 8601 on normalisation.
RFC 5322 (Internet Message Format): The entity extractor applies an RFC 5322-compliant regular expression to detect and extract email addresses from ingested payloads for entity correlation.
ITU-T E.164: Phone numbers extracted from ingested payloads are normalised to the international E.164 format, enabling consistent cross-source correlation.
JSON Web Token / JWKS (RFC 7519 / RFC 7517): Internal REST queue-consumer endpoints require a scoped RS256 service JWT; the middleware verifies tokens against a JWKS endpoint, ensuring only authorised workers may process or DLQ-requeue jobs.
W3C PROV-DM (Provenance Data Model): After each ingestion job is persisted, the pipeline records an entity-creation provenance event via the PROV-DM service to support auditability and data lineage.
OpenLineage specification: The pipeline emits START, COMPLETE, and FAIL lineage events through an OpenLineageService at each stage of job execution, enabling interoperable data-lineage tracking across the ingestion pipeline.
OASIS STIX 2.1 / CAP v1.2 / NIEM: Connectors feeding the ingestion pipeline are required to deliver validated open-standard DTOs (STIX 2.1 for threat intelligence, CAP v1.2 for emergency alerts, NIEM for law-enforcement data exchanges) as the primary output contract before structural type mapping is applied.

Last Reviewed: 2026-02-05 Last Updated: 2026-04-14