{"id":"data-import-export","slug":"data-import-export","title":"Data Import/Export","description":"A government data registry preparing for an audit needs to export a filtered subset of records in a specific format, with PII fields masked according to disclosure policy, within a day of the request arriving. An intelli","category":"data-integration","tags":["data-integration","real-time","compliance","geospatial"],"lastModified":"2026-02-05","source_ref":"content/modules/data-import-export.md","url":"/developers/data-import-export","htmlPath":"/developers/data-import-export","jsonPath":"/api/docs/modules/data-import-export","markdownPath":"/api/docs/modules/data-import-export?format=markdown","checksum":"7afe5f49ad1ee19d2bc896f145aa2a69e25ff54c80cd2a93be9ecea5c1c4fe68","headings":[{"id":"overview","text":"Overview","level":2},{"id":"key-features","text":"Key Features","level":2},{"id":"use-cases","text":"Use Cases","level":2},{"id":"integration","text":"Integration","level":2}],"markdown":"# Data Import/Export\n\n## Overview\n\nA government data registry preparing for an audit needs to export a filtered subset of records in a specific format, with PII fields masked according to disclosure policy, within a day of the request arriving. An intelligence agency onboarding a new partner organisation needs to import that partner's historical incident records, map their field names to the Argus schema, validate required fields, and handle the inevitable cases where dates are formatted differently from what was expected. In both scenarios, the people doing the work should not need engineering support to run the operation.\n\nThe Data Import/Export module handles bulk data operations across multiple file formats with intelligent field mapping, streaming processing for large files, and configurable validation rules. Whether importing records from spreadsheets, exporting data for analysis, or running scheduled data exchanges with external systems, it provides high throughput with strong data integrity guarantees and a complete audit trail of every operation. Real-time progress updates are delivered via GraphQL subscriptions so operators know where a large job stands without refreshing a status page.\n\n```mermaid\nflowchart LR\n    A[Source File / API] --> B[Format Detection]\n    B --> C[Field Mapping]\n    C --> D[Validation Engine]\n    D --> E{Valid?}\n    E -- No --> F[Error Report]\n    E -- Yes --> G[Stream Processor]\n    G --> H[Upsert to PostgreSQL]\n    H --> I[Audit Trail]\n    F --> J[Review & Correct]\n    J --> C\n```\n\n## Key Features\n\n- **Multi-Format Support**: Import and export data in CSV, JSON, JSONL, XML, Excel (XLSX/XLS), and Apache Parquet formats with automatic format detection. Parquet is particularly suited for large analytical exports.\n- **Intelligent Field Mapping**: Map source fields to destination schemas using auto-detection, fuzzy name matching, and custom transformation rules. Non-standard column names from partner systems are handled without manual schema editing.\n- **Advanced Validation**: Enforce schema rules, business logic, and data quality checks during import with configurable error thresholds and handling modes. Problems are reported at the record level with specific field-level detail.\n- **Streaming Processing**: Process large files using streaming and chunked processing rather than loading entire datasets into memory. Files with millions of records complete without memory pressure.\n- **Upsert Mode**: Update existing records or insert new ones in a single operation, avoiding the overhead of separate delete-and-insert workflows and the risk of duplicate records.\n- **Scheduled Operations**: Automate recurring import and export jobs with monitoring, alerting, and retry logic for unattended execution. Partner data exchanges run on schedule without manual coordination.\n- **Template System**: Save and reuse import/export configurations as templates to standardise recurring data exchange processes. A template built for one partner's format can be adapted for similar sources.\n- **Export Filtering and Field Selection**: Export only the data needed by applying filters and selecting specific fields, with compression options to reduce file sizes. PII masking and access controls are applied automatically based on the requesting user's clearance and role.\n- **Audit Trail**: Track all import and export operations with complete lineage, version control, and compliance reporting. Every export is attributed to a user, timestamp, and applied filter set.\n- **Real-Time Progress Monitoring**: Subscribe to live progress updates via GraphQL subscriptions during long-running operations with detailed success, error, and throughput metrics.\n\n## Use Cases\n\n- **Bulk Data Loading**: Import thousands or millions of records from CSV or Excel files with field mapping, validation, and deduplication to quickly populate the platform with existing data from legacy systems or partner organisations.\n- **Compliance Data Export**: Export filtered datasets in the required format for regulatory reporting, audit responses, or data portability requests, with PII masking and access controls applied automatically based on disclosure policy.\n- **Scheduled Data Exchanges**: Set up automated recurring imports from partner systems or exports to downstream analytics platforms, with monitoring and alerting for any failures. The exchange runs even when no one is watching.\n- **Data Migration**: Move data between systems by exporting from the source and importing into the destination with transformation rules, validation, and error recovery to ensure records are not lost in transit.\n- **Ad-Hoc Analysis**: Export specific subsets of data in Parquet or CSV format for analysis in external tools, with compression and field selection to keep file sizes manageable.\n\n## Integration\n\nThe Data Import/Export module integrates with the platform's validation engine, transformation pipeline, and audit system, and applies role-based access controls and field-level security for all operations. All imported data is written to PostgreSQL as the primary data store. The module connects with cloud storage services for file handling and supports both API-driven and scheduled execution modes.\n\n**Last Reviewed:** 2026-02-05\n**Last Updated:** 2026-04-14\n"}