Data Bulk Operations

Overview#

When a law enforcement agency migrates ten years of case records from a legacy system into Argus, they cannot afford a process that corrupts three thousand records partway through and leaves the database in an indeterminate state. They need a mechanism that validates every record before touching the database, applies changes atomically, keeps a before-and-after record of every modification, and can roll back cleanly if something goes wrong at step eight of twelve. They also need to know exactly where the job stands at any point during a run that might take hours.

The Data Bulk Operations module provides exactly this. It handles large-scale create, update, and delete operations across millions of records with real-time progress tracking, multi-stage validation, flexible error modes, and complete rollback protection. Organisations in financial crime, government data registries, healthcare, and intelligence all operate datasets at a scale where manual record-by-record processing is not viable. Bulk operations make it safe and auditable.

Key Features#

High-Throughput Processing: Execute bulk create, update, and delete operations at high volume with dynamically optimised batch sizes and parallel worker distribution, scaling to meet the demands of multi-million-record datasets.
Real-Time Progress Tracking: Monitor operation status, estimated completion time, and performance metrics through live dashboards and typed integration contract subscription-based updates. Operators know exactly where a long-running job stands without polling.
Multi-Stage Validation: Validate data through schema checks, business rule enforcement, and relationship integrity verification before any write reaches platform record store. Problems are caught before they can affect production data.
Flexible Error Handling: Choose from five error modes: fail-fast, continue-on-error, retry, quarantine, and adaptive (which responds to error rates automatically). The right mode depends on whether partial progress is acceptable.
Rollback Protection: Atomic transactions with automatic rollback on critical failures ensure zero data loss. Dry-run previews are available before execution, so operators can validate the impact of a bulk operation without committing changes.
Deduplication: Detect duplicates through exact matching, fuzzy matching, composite keys, and custom business logic to prevent redundant records from entering the system during import.
Cascade Operations: Manage dependent records across related collections with configurable cascade rules for updates and deletions, maintaining referential integrity across complex data models.
Conflict Resolution: Handle concurrent modifications with strategies including last-write-wins, first-write-wins, field-level merging, and manual review queuing for cases that require human judgement.
Change History Tracking: Record before-and-after states for all modified records with field-level change details and timestamps. Every bulk operation produces a complete change log for audit and debugging.
Comprehensive Audit Logging: Maintain immutable audit trails that meet SOC 2, HIPAA, GDPR, and sector-specific compliance requirements. Every operation is attributed to a user, organisation, and timestamp.

Use Cases#

Large-Scale Data Migration: Import millions of records from legacy systems or external sources with validation, deduplication, and relationship mapping. Data arrives clean from day one rather than requiring a cleanup pass after the fact.
Bulk Data Cleanup: Identify and remove duplicate entries, correct formatting inconsistencies, or update fields across large datasets in a single operation with full rollback capability if the results are not as expected.
Regulatory Compliance Operations: Execute bulk updates or deletions required by data retention policies, right-to-erasure requests, or regulatory changes. Complete audit trails document every action taken for compliance reporting.
Scheduled Maintenance Tasks: Automate recurring bulk operations such as recalculating derived fields, archiving aged records, or synchronising data across systems during low-traffic windows.
Cross-System Data Synchronisation: Synchronise records across multiple enterprise systems with pre-built connectors, real-time progress visibility, and automated error recovery to keep datasets consistent.

Integration#

The Data Bulk Operations module integrates with all major databases and enterprise systems through pre-built connectors, and provides a full API for programmatic control over batch operations. Real-time progress updates are available via real-time subscriptions. All writes target the platform record store, with organisation scoping enforced at the operation level.

Open Standards#

OAuth 2.0 and JWT Bearer Token: Token-based authentication protects typed, auditable read and write workflows across the platform.
RFC 6455 (WebSocket Protocol): Real-time progress updates for long-running bulk jobs are pushed to connected clients over WebSocket connections; the server explicitly enforces RFC 6455 close codes (including 1008 for policy violations) during authentication and origin validation.
W3C PROV-DM (Provenance Data Model): Each bulk operation emits provenance records conforming to the W3C PROV-DM recommendation, capturing prov:Entity, prov:Activity, and prov:Agent relationships so that the full before-and-after lineage of every batch write is queryable.
OpenLineage Specification: The ingestion pipeline that feeds bulk import operations emits structured OpenLineage RunEvents to platform record store, recording dataset-level input/output lineage for every normalisation job.
JSON Schema Draft 2020-12: Multi-stage pre-write validation and schema evolution checks apply JSON Schema Draft 2020-12 structural rules to incoming records, catching format violations before any data reaches the database.
ISO 8601 (Date and Time): All operation timestamps, created_at, updated_at, completed_at, and event payloads broadcast over WebSocket, are serialised as ISO 8601 strings, ensuring interoperability with downstream consumers and audit systems.
OAuth 2.0 (RFC 6749): Every bulk operation workflow handler enforces bearer-token authentication; the API contract requires OAuth 2.0 scopes (argus:jobs:read and argus:jobs:write) to enqueue, monitor, or cancel batch jobs.
OASIS STIX 2.1: Data ingested in bulk from threat intelligence sources passes through STIX 2.1 connector bases that validate objects against the OASIS STIX 2.1 JSON Schema before any Argus type mapping is applied.

Last Reviewed: 2026-02-23 Last Updated: 2026-04-14