Entity Resolution System

Overview#

The same individual appears in three separate databases under slightly different names, two different date-of-birth formats, and one misspelled address. Without entity resolution, an investigator searching for him gets three partial records. The Argus Entity Resolution System closes that gap: it matches, deduplicates, and merges records across sources using advanced similarity algorithms and machine learning, then routes ambiguous cases for analyst review before committing any merge.

Entity resolution is critical for maintaining data quality in investigative platforms where the same subject may appear in multiple databases, documents, or external data feeds with slight variations in naming, spelling, or attributes. Without effective resolution, investigators work with fragmented information that obscures the complete picture of a subject's activities and connections.

The system operates both at ingestion time, preventing new duplicates from entering the platform, and through continuous background processing that identifies and resolves duplicates across the existing data corpus as matching algorithms improve and new data sources are connected.

Key Features#

Matching Algorithms#

Multi-attribute matching algorithms
- phonetic name matching (Soundex, Metaphone)
- fuzzy string matching
- address normalisation
- and date variation handling
Configurable matching rules allowing agencies to tune precision and recall for their data characteristics
Machine learning models that improve matching accuracy over time based on analyst feedback
Support for multi-language name matching including transliteration and character set normalisation
Alias and nickname resolution connecting alternate identities to primary entity records
Phonetic matching algorithms handling misspellings and transliteration variations across languages

Confidence Scoring and Review#

Confidence scoring from 0 to 100 percent with configurable thresholds for automatic merge, manual review, or flagging
Match status workflow covering pending, confirmed, and rejected states with batch processing support
Analyst review interface presenting side-by-side record comparison with highlighted differences
Bulk review tools enabling efficient processing of large match queues with keyboard shortcuts

Merge and Record Management#

Configurable merge strategies including keep newer, keep older, manual selection, and combine all non-null values
Merge history tracking with split and undo capability and canonical record designation
Duplicate prevention at ingestion with continuous background deduplication
Cross-reference maintenance preserving links between merged records and their original sources
Conflict resolution workflows for cases where merged records have contradictory attribute values

Data Quality and Monitoring#

Quality metrics dashboard with data lineage tracking and source system attribution
Support for person, organisation, location, asset, and event entity types with type-specific matching algorithms
Resolution performance metrics tracking match rates, review throughput, and accuracy over time
Source quality scoring identifying data feeds that consistently produce duplicates or low-quality records
Automated reporting on deduplication progress and remaining duplicate estimates
Data steward tools for managing resolution rules, reviewing exceptions, and adjusting matching parameters
Cross-system resolution tracking showing how entities are linked across all connected data sources
Batch processing capabilities for large-scale entity resolution across imported datasets
Resolution confidence decay tracking how match quality changes as entity data ages

Use Cases#

Cross-Database Deduplication. Consolidate records from disparate systems by identifying matching records, reviewing and confirming matches, merging into canonical records, and maintaining source linkage. Ensure investigators have complete subject profiles rather than fragmented information scattered across multiple databases.

Real-Time Duplicate Prevention. Check for existing matches before record creation, present potential duplicates to users, allow linking to existing records, and learn from user corrections over time. Prevent duplicate proliferation at the point of data entry across all ingestion channels.

Data Migration. Clean data during system migration through bulk similarity analysis, configurable auto-merge thresholds, manual review for edge cases, and complete audit trail for compliance with rollback capability. Ensure data quality is improved rather than degraded during platform transitions.

Intelligence Analysis Support. Enhance analytical accuracy by ensuring entity resolution across intelligence sources, enabling analysts to work with unified entity profiles that reflect the complete known information about subjects, organisations, and assets of interest.

Data Quality Improvement. Continuously improve organisational data quality by identifying and resolving duplicate records, standardising entity information, and maintaining consistent identity records across all connected systems. Generate data quality metrics and trend reports for organisational awareness.

Integration#

Integrates with profile management and entity profile systems for unified record maintenance
Connects with data import and export workflows for ingestion-time resolution
Links to audit trail and compliance logging for complete activity tracking
Works alongside data quality and validation rule engines
Compatible with investigation and case management systems for seamless entity access
Supports batch processing interfaces for bulk data cleanup and migration projects
Feeds resolution metrics into data quality dashboards for organisational oversight
Supports real-time resolution during data entry with immediate duplicate detection and merge suggestions
Automated data standardisation normalising addresses, names, and identifiers before matching
Entity relationship preservation maintaining connections during merge and split operations
Connects with external identity verification services for enrichment and validation
Supports API-based resolution for real-time matching from external applications

Open Standards#

Privacy-Preserving Record Linkage (PPRL), Schnell et al. 2009 (DOI:10.1186/1472-6947-9-41): The matching engine implements Bloom-filter Cryptographic Long-term Key (CLK) encoding with bigram decomposition and Dice-coefficient scoring, as specified in this open-access method, contributing 30% of the hybrid similarity score.
W3C PROV-DM (Provenance Data Model, W3C Recommendation 2013): Every entity merge operation records provenance using the core prov:Entity, prov:Activity, and prov:Agent concepts with relationships including wasGeneratedBy, wasDerivedFrom, and wasAssociatedWith, stored in platform record store and serialisable to PROV-JSON.
W3C JSON-LD (JSON-based Linked Data, W3C Recommendation 2014): The provenance serialiser outputs PROV-O, structured JSON-LD using the canonical https://www.w3.org/ns/prov context, enabling partner verifiers to process records with any stock JSON-LD processor.
OAuth 2.0 and JWT Bearer Token: Token-based authentication protects typed, auditable read and write workflows across the platform.
JSON (ECMA-404 / RFC 8259): Entity attributes and all API payloads are encoded as JSON; the underlying datastore uses platform record store JSONB, ensuring interoperability with any standard JSON processor.
OAuth 2.0 (RFC 6749): Every authorised workflow handler enforces bearer-token authentication via the platform's OAuth 2.0 access-control layer, and role-based permission checks gate read and write operations throughout the resolution pipeline.

Last Reviewed: 2026-02-05 Last Updated: 2026-04-14

Entity Resolution System

Ready to Build?