Overview#
A government data registry preparing for an audit needs to export a filtered subset of records in a specific format, with PII fields masked according to disclosure policy, within a day of the request arriving. An intelligence agency onboarding a new partner organisation needs to import that partner's historical incident records, map their field names to the Argus schema, validate required fields, and handle the inevitable cases where dates are formatted differently from what was expected. In both scenarios, the people doing the work should not need engineering support to run the operation.
The Data Import/Export module handles bulk data operations across multiple file formats with intelligent field mapping, streaming processing for large files, and configurable validation rules. Whether importing records from spreadsheets, exporting data for analysis, or running scheduled data exchanges with external systems, it provides high throughput with strong data integrity guarantees and a complete audit trail of every operation. Real-time progress updates are delivered via GraphQL subscriptions so operators know where a large job stands without refreshing a status page.
Key Features#
- Multi-Format Support: Import and export data in CSV, JSON, JSONL, XML, Excel (XLSX/XLS), and Apache Parquet formats with automatic format detection. Parquet is particularly suited for large analytical exports.
- Intelligent Field Mapping: Map source fields to destination schemas using auto-detection, fuzzy name matching, and custom transformation rules. Non-standard column names from partner systems are handled without manual schema editing.
- Advanced Validation: Enforce schema rules, business logic, and data quality checks during import with configurable error thresholds and handling modes. Problems are reported at the record level with specific field-level detail.
- Streaming Processing: Process large files using streaming and chunked processing rather than loading entire datasets into memory. Files with millions of records complete without memory pressure.
- Upsert Mode: Update existing records or insert new ones in a single operation, avoiding the overhead of separate delete-and-insert workflows and the risk of duplicate records.
- Scheduled Operations: Automate recurring import and export jobs with monitoring, alerting, and retry logic for unattended execution. Partner data exchanges run on schedule without manual coordination.
- Template System: Save and reuse import/export configurations as templates to standardise recurring data exchange processes. A template built for one partner's format can be adapted for similar sources.
- Export Filtering and Field Selection: Export only the data needed by applying filters and selecting specific fields, with compression options to reduce file sizes. PII masking and access controls are applied automatically based on the requesting user's clearance and role.
- Audit Trail: Track all import and export operations with complete lineage, version control, and compliance reporting. Every export is attributed to a user, timestamp, and applied filter set.
- Real-Time Progress Monitoring: Subscribe to live progress updates via GraphQL subscriptions during long-running operations with detailed success, error, and throughput metrics.
Use Cases#
- Bulk Data Loading: Import thousands or millions of records from CSV or Excel files with field mapping, validation, and deduplication to quickly populate the platform with existing data from legacy systems or partner organisations.
- Compliance Data Export: Export filtered datasets in the required format for regulatory reporting, audit responses, or data portability requests, with PII masking and access controls applied automatically based on disclosure policy.
- Scheduled Data Exchanges: Set up automated recurring imports from partner systems or exports to downstream analytics platforms, with monitoring and alerting for any failures. The exchange runs even when no one is watching.
- Data Migration: Move data between systems by exporting from the source and importing into the destination with transformation rules, validation, and error recovery to ensure records are not lost in transit.
- Ad-Hoc Analysis: Export specific subsets of data in Parquet or CSV format for analysis in external tools, with compression and field selection to keep file sizes manageable.
Integration#
The Data Import/Export module integrates with the platform's validation engine, transformation pipeline, and audit system, and applies role-based access controls and field-level security for all operations. All imported data is written to PostgreSQL as the primary data store. The module connects with cloud storage services for file handling and supports both API-driven and scheduled execution modes.
Open Standards#
- ISO 19005 (PDF/A): Export packages are produced in all four PDF/A archival variants (ISO 19005-1 through 19005-4), with pre-flight validation enforced before the package is sealed, to guarantee long-term readability for courts and regulators.
- RFC 3161 (Internet X.509 PKI Time-Stamp Protocol): Export manifests are optionally timestamped by a trusted authority, binding the SHA-256 digest to a verifiable clock for chain-of-custody and admissibility purposes.
- OASIS Common Alerting Protocol (CAP) v1.2: Incident and alert records can be exported as CAP 1.2 XML documents, enabling interoperability with emergency-management systems and RMS connectors.
- NIEM 6.0 (National Information Exchange Model): Alert and incident payloads can be serialised as NIEM-JSON and posted to law-enforcement RMS endpoints, following the NIEM 6.0 JSON-LD context for field naming.
- OASIS STIX 2.1: Threat-intelligence exports are produced as fully conformant STIX 2.1 bundles containing typed indicator and report objects, consumable by any STIX-aware platform.
- OpenLineage Specification: Every ingestion pipeline job emits START, COMPLETE, and FAIL run events to an OpenLineage-compatible lineage store, providing a queryable audit DAG of all data movements.
- W3C PROV-DM (Provenance Data Model): Ingestion events are recorded as PROV-DM entity-creation activities, linking imported records to their originating connector and job for downstream provenance queries.
- ISO/IEC 27037:2012: The Bates numbering and export packaging services reference ISO 27037 digital-evidence handling guidelines, alongside FRCP 34 and FRE 901, to support court-admissible productions.
Last Reviewed: 2026-02-05 Last Updated: 2026-04-14