Overview#
The same individual appears in three separate databases under slightly different names, two different date-of-birth formats, and one misspelled address. Without entity resolution, an investigator searching for him gets three partial records. The Argus Entity Resolution System closes that gap: it matches, deduplicates, and merges records across sources using advanced similarity algorithms and machine learning, then routes ambiguous cases for analyst review before committing any merge.
Entity resolution is critical for maintaining data quality in investigative platforms where the same subject may appear in multiple databases, documents, or external data feeds with slight variations in naming, spelling, or attributes. Without effective resolution, investigators work with fragmented information that obscures the complete picture of a subject's activities and connections.
The system operates both at ingestion time, preventing new duplicates from entering the platform, and through continuous background processing that identifies and resolves duplicates across the existing data corpus as matching algorithms improve and new data sources are connected.
Key Features#
Matching Algorithms#
- Multi-attribute matching algorithms
- phonetic name matching (Soundex, Metaphone)
- fuzzy string matching
- address normalisation
- and date variation handling
- Configurable matching rules allowing agencies to tune precision and recall for their data characteristics
- Machine learning models that improve matching accuracy over time based on analyst feedback
- Support for multi-language name matching including transliteration and character set normalisation
- Alias and nickname resolution connecting alternate identities to primary entity records
- Phonetic matching algorithms handling misspellings and transliteration variations across languages
Confidence Scoring and Review#
- Confidence scoring from 0 to 100 percent with configurable thresholds for automatic merge, manual review, or flagging
- Match status workflow covering pending, confirmed, and rejected states with batch processing support
- Analyst review interface presenting side-by-side record comparison with highlighted differences
- Bulk review tools enabling efficient processing of large match queues with keyboard shortcuts
Merge and Record Management#
- Configurable merge strategies including keep newer, keep older, manual selection, and combine all non-null values
- Merge history tracking with split and undo capability and canonical record designation
- Duplicate prevention at ingestion with continuous background deduplication
- Cross-reference maintenance preserving links between merged records and their original sources
- Conflict resolution workflows for cases where merged records have contradictory attribute values
Data Quality and Monitoring#
- Quality metrics dashboard with data lineage tracking and source system attribution
- Support for person, organisation, location, asset, and event entity types with type-specific matching algorithms
- Resolution performance metrics tracking match rates, review throughput, and accuracy over time
- Source quality scoring identifying data feeds that consistently produce duplicates or low-quality records
- Automated reporting on deduplication progress and remaining duplicate estimates
- Data steward tools for managing resolution rules, reviewing exceptions, and adjusting matching parameters
- Cross-system resolution tracking showing how entities are linked across all connected data sources
- Batch processing capabilities for large-scale entity resolution across imported datasets
- Resolution confidence decay tracking how match quality changes as entity data ages
Use Cases#
Cross-Database Deduplication. Consolidate records from disparate systems by identifying matching records, reviewing and confirming matches, merging into canonical records, and maintaining source linkage. Ensure investigators have complete subject profiles rather than fragmented information scattered across multiple databases.
Real-Time Duplicate Prevention. Check for existing matches before record creation, present potential duplicates to users, allow linking to existing records, and learn from user corrections over time. Prevent duplicate proliferation at the point of data entry across all ingestion channels.
Data Migration. Clean data during system migration through bulk similarity analysis, configurable auto-merge thresholds, manual review for edge cases, and complete audit trail for compliance with rollback capability. Ensure data quality is improved rather than degraded during platform transitions.
Intelligence Analysis Support. Enhance analytical accuracy by ensuring entity resolution across intelligence sources, enabling analysts to work with unified entity profiles that reflect the complete known information about subjects, organisations, and assets of interest.
Data Quality Improvement. Continuously improve organisational data quality by identifying and resolving duplicate records, standardising entity information, and maintaining consistent identity records across all connected systems. Generate data quality metrics and trend reports for organisational awareness.
Integration#
- Integrates with profile management and entity profile systems for unified record maintenance
- Connects with data import and export workflows for ingestion-time resolution
- Links to audit trail and compliance logging for complete activity tracking
- Works alongside data quality and validation rule engines
- Compatible with investigation and case management systems for seamless entity access
- Supports batch processing interfaces for bulk data cleanup and migration projects
- Feeds resolution metrics into data quality dashboards for organisational oversight
- Supports real-time resolution during data entry with immediate duplicate detection and merge suggestions
- Automated data standardisation normalising addresses, names, and identifiers before matching
- Entity relationship preservation maintaining connections during merge and split operations
- Connects with external identity verification services for enrichment and validation
- Supports API-based resolution for real-time matching from external applications
Open Standards#
- Privacy-Preserving Record Linkage (PPRL), Schnell et al. 2009 (DOI:10.1186/1472-6947-9-41): The matching engine implements Bloom-filter Cryptographic Long-term Key (CLK) encoding with bigram decomposition and Dice-coefficient scoring, as specified in this open-access method, contributing 30% of the hybrid similarity score.
- W3C PROV-DM (Provenance Data Model, W3C Recommendation 2013): Every entity merge operation records provenance using the core prov:Entity, prov:Activity, and prov:Agent concepts with relationships including wasGeneratedBy, wasDerivedFrom, and wasAssociatedWith, stored in PostgreSQL and serialisable to PROV-JSON.
- W3C JSON-LD (JSON-based Linked Data, W3C Recommendation 2014): The provenance serialiser outputs PROV-O, structured JSON-LD using the canonical
https://www.w3.org/ns/provcontext, enabling partner verifiers to process records with any stock JSON-LD processor. - GraphQL (June 2018 Specification): All entity resolution operations, including duplicate detection, similarity calculation, merge, split, and history queries, are exposed as a typed GraphQL API built on the strawberry framework.
- JSON (ECMA-404 / RFC 8259): Entity attributes and all API payloads are encoded as JSON; the underlying datastore uses PostgreSQL JSONB, ensuring interoperability with any standard JSON processor.
- OAuth 2.0 (RFC 6749): Every GraphQL resolver enforces bearer-token authentication via the platform's OAuth 2.0 access-control layer, and role-based permission checks gate read and write operations throughout the resolution pipeline.
Last Reviewed: 2026-02-05 Last Updated: 2026-04-14