Overview#
An organised crime investigation pulls records from three separate source systems: a legacy police database, a customs import, and a financial intelligence unit feed. The name "Mohammed Al-Rashidi" appears in all three, but with variations in spelling, different date-of-birth formats, and partially overlapping address histories. Without resolution, analysts work with three disconnected profiles and miss the full picture. The Entity Resolution domain identifies that these records describe the same individual and merges them into a single canonical profile, preserving the source links for audit purposes.
Duplicate entity records are an unavoidable reality in multi-source investigations. Different systems spell names differently, record partial data, or use local formatting conventions. Entity Resolution applies configurable matching strategies to detect these overlaps, scores their similarity, and either merges them automatically when confidence is high or routes them for human review when it is not.
Key Features#
- Automated duplicate entity detection with configurable matching strategies
- Confidence scoring for match quality assessment
- Merge workflows with conflict resolution for duplicate consolidation
- Support for person, organisation, location, and other entity type resolution
- Cross-data-source duplicate detection
- Match review and approval workflows for human-in-the-loop validation
- Entity relationship preservation during merges
- Audit trail for all resolution decisions
Use Cases#
- Detecting duplicate person records across multiple imported data sources, such as police databases, customs records, and open-source feeds
- Merging confirmed duplicate entities with automated conflict resolution, keeping the most recent or highest-confidence field values
- Reviewing borderline entity match candidates through a human-in-the-loop workflow, confirming true duplicates and rejecting false positives
- Maintaining data quality through ongoing resolution processes as new records arrive from external systems
Industry Context#
National criminal intelligence databases regularly ingest records from regional forces with inconsistent naming and identifier conventions. Financial crime units merge customer records across banking systems to build complete beneficial ownership pictures. Border control agencies reconcile passenger manifests against watchlists where name transliteration produces multiple variants. Immigration services deduplicate application records across processing offices to detect identity reuse.
Integration#
Integrates with person profiles, organisation management, and graph relationship domains for comprehensive entity resolution. Resolution decisions are recorded in PostgreSQL with full audit trail. The graph layer is updated to reflect merged canonical identities after each resolved merge.
Open Standards#
- W3C PROV-DM (Provenance Data Model, W3C Recommendation 2013): every entity merge operation is recorded using the PROV-DM core concepts,
prov:Entity,prov:Activity, andprov:Agent, with relationships includingwasDerivedFromandwasGeneratedBy, providing a standards-conformant audit trail for all resolution decisions. - Privacy-Preserving Record Linkage via Bloom Filters (Schnell et al. 2009, DOI:10.1186/1472-6947-9-41): the PPRL component implements the CLK (Cryptographic Long-term Key) Bloom-filter encoding method to compute field-level similarity scores without exposing plaintext matching criteria, contributing 30% of the hybrid confidence score.
- W3C PROV-JSON: provenance records are serialised using the PROV-JSON encoding defined as a W3C Working Group Note, enabling export and interchange of entity lineage data with external audit and compliance tooling.
- GraphQL (June 2018 Specification): all entity resolution operations, duplicate discovery, similarity calculation, merge, split, and history retrieval, are exposed through a typed GraphQL API with authenticated query and mutation resolvers.
- OAuth 2.0 (RFC 6749): access to all resolution endpoints is gated by OAuth 2.0 bearer tokens, with RBAC permission checks enforcing
ENTITY_RESOLUTION_READandENTITY_RESOLUTION_WRITEscopes per authenticated user. - JSON / JSON Schema (RFC 8259): entity attributes, merge metadata, and algorithm details are stored and exchanged as JSON, with JSONB used for efficient indexing and querying of attribute payloads in the underlying database.
- Unicode Normalisation (Unicode Standard, UAX #15): string attribute comparison normalises values to a consistent case-folded, whitespace-stripped form before Levenshtein and Bloom-filter encoding, ensuring correct handling of name transliterations and encoding variants across multi-source record imports.
Last Reviewed: 2026-02-05 Last Updated: 2026-04-14