Entity Resolution Domain

Overview#

An organised crime investigation pulls records from three separate source systems: a legacy police database, a customs import, and a financial intelligence unit feed. The name "Mohammed Al-Rashidi" appears in all three, but with variations in spelling, different date-of-birth formats, and partially overlapping address histories. Without resolution, analysts work with three disconnected profiles and miss the full picture. The Entity Resolution domain identifies that these records describe the same individual and merges them into a single canonical profile, preserving the source links for audit purposes.

Duplicate entity records are an unavoidable reality in multi-source investigations. Different systems spell names differently, record partial data, or use local formatting conventions. Entity Resolution applies configurable matching strategies to detect these overlaps, scores their similarity, and either merges them automatically when confidence is high or routes them for human review when it is not.

Key Features#

Automated duplicate entity detection with configurable matching strategies
Confidence scoring for match quality assessment
Merge workflows with conflict resolution for duplicate consolidation
Support for person, organisation, location, and other entity type resolution
Cross-data-source duplicate detection
Match review and approval workflows for human-in-the-loop validation
Entity relationship preservation during merges
Audit trail for all resolution decisions

Use Cases#

Detecting duplicate person records across multiple imported data sources, such as police databases, customs records, and open-source feeds
Merging confirmed duplicate entities with automated conflict resolution, keeping the most recent or highest-confidence field values
Reviewing borderline entity match candidates through a human-in-the-loop workflow, confirming true duplicates and rejecting false positives
Maintaining data quality through ongoing resolution processes as new records arrive from external systems

Industry Context#

National criminal intelligence databases regularly ingest records from regional forces with inconsistent naming and identifier conventions. Financial crime units merge customer records across banking systems to build complete beneficial ownership pictures. Border control agencies reconcile passenger manifests against watchlists where name transliteration produces multiple variants. Immigration services deduplicate application records across processing offices to detect identity reuse.

Integration#

Integrates with person profiles, organisation management, and graph relationship domains for comprehensive entity resolution. Resolution decisions are recorded in platform record store with full audit trail. The graph layer is updated to reflect merged canonical identities after each resolved merge.

Open Standards#

W3C PROV-DM (Provenance Data Model, W3C Recommendation 2013): every entity merge operation is recorded using the PROV-DM core concepts, prov:Entity, prov:Activity, and prov:Agent, with relationships including wasDerivedFrom and wasGeneratedBy, providing a standards-conformant audit trail for all resolution decisions.
Privacy-Preserving Record Linkage via Bloom Filters (Schnell et al. 2009, DOI:10.1186/1472-6947-9-41): the PPRL component implements the CLK (Cryptographic Long-term Key) Bloom-filter encoding method to compute field-level similarity scores without exposing plaintext matching criteria, contributing 30% of the hybrid confidence score.
W3C PROV-JSON: provenance records are serialised using the PROV-JSON encoding defined as a W3C Working Group Note, enabling export and interchange of entity lineage data with external audit and compliance tooling.
OAuth 2.0 and JWT Bearer Token: Token-based authentication protects typed, auditable read and write workflows across the platform.
OAuth 2.0 (RFC 6749): access to all resolution endpoints is gated by OAuth 2.0 bearer tokens, with RBAC permission checks enforcing ENTITY_RESOLUTION_READ and ENTITY_RESOLUTION_WRITE scopes per authenticated user.
JSON / JSON Schema (RFC 8259): entity attributes, merge metadata, and algorithm details are stored and exchanged as JSON, with JSONB used for efficient indexing and querying of attribute payloads in the underlying database.
Unicode Normalisation (Unicode Standard, UAX #15): string attribute comparison normalises values to a consistent case-folded, whitespace-stripped form before Levenshtein and Bloom-filter encoding, ensuring correct handling of name transliterations and encoding variants across multi-source record imports.

Last Reviewed: 2026-02-05 Last Updated: 2026-04-14