Overview#
A financial crime unit receives transaction data from three separate systems: a core banking platform, an external payments processor, and a legacy trade finance application. The same individual appears as "Mohammed Al-Rashid", "M. Al Rashid", and "Mohamed Alrashid" across the three sources. Without entity resolution, investigators work against three separate, incomplete profiles. With it, a single golden record consolidates everything that is known about that individual from all three sources, and every new piece of intelligence lands in the right place.
The Profile Entity Resolution module handles exactly this challenge at scale, running a multi-stage matching pipeline across 153+ data sources to deduplicate entity records, create accurate golden records, and give investigators a single, authoritative view of every person, organisation, location, and object in the system.
Open Standards#
- Privacy-Preserving Record Linkage (PPRL), Schnell et al. 2009, DOI 10.1186/1472-6947-9-41: The PPRL component of the hybrid scoring pipeline encodes entity fields as Bloom filters using bigram decomposition and SHA-256 hashing, then computes Dice-coefficient similarity between encoded representations, so no plaintext identity attributes are exposed during cross-source matching.
- W3C PROV-DM / PROV-JSON (W3C Recommendation, April 2013): Every entity merge operation records a provenance graph using the W3C Provenance Data Model, capturing prov:Entity, prov:Activity, and prov:Agent relationships (wasGeneratedBy, wasDerivedFrom, wasAssociatedWith), so the full audit chain of golden record creation is queryable and exportable as PROV-JSON.
- GraphQL (June 2018 specification): All entity resolution queries and mutations (find duplicates, calculate similarity, merge, split, generate embeddings) are exposed through a typed GraphQL API, enabling clients to request exactly the fields they need without over-fetching.
- RFC 4122 (UUID version 4): Every entity, match record, and merge-history entry is identified by a version-4 UUID, ensuring globally unique, collision-resistant identifiers that are safe for cross-system entity references.
- FIPS PUB 180-4 / SHA-256: The PPRL Bloom filter implementation uses SHA-256 as its deterministic hash function for each bigram-seed pair, providing cryptographic irreversibility so encoded field values cannot be reverse-engineered to recover the original plaintext.
- RFC 7519 JSON Web Token (JWT) with RS256: All entity resolution API endpoints require a valid RS256-signed JWT, verified against a JWKS endpoint, with role-based access control checked per operation before any entity data is returned or mutated.
- JSON (ECMA-404 / RFC 8259): Entity attributes, similarity metric details, and algorithm result payloads are stored and transmitted as JSON, with PostgreSQL JSONB used internally for schema-flexible attribute storage across heterogeneous source systems.
Last Reviewed: 2026-02-05 Last Updated: 2026-04-14
Key Features#
- Multi-Stage Matching Pipeline: A five-stage pipeline progresses from candidate generation through similarity scoring, match classification, entity clustering, and quality verification, with configurable thresholds for automatic merging versus manual review.
- Machine Learning Match Models: Models trained on extensive labelled entity pair datasets predict match likelihood using features covering name similarity, identifier overlap, date proximity, geographic correlation, and relationship patterns.
- Duplicate Detection and Deduplication: Systematic identification of duplicate entity profiles supports full database scans, incremental detection, and subset analysis, producing clean golden records that aggregate all known information about each entity.
- Golden Record Creation: Intelligent data merging selects the best representation of each entity by evaluating completeness, recency, source reliability, verification status, and confidence scores across all duplicate sources.
- Interactive Resolution Interface: A review interface presents side-by-side entity comparisons with matching attributes, conflicting data, AI recommendations, and guided decision-making for manual resolution of ambiguous matches.
- Entity Linking: Explicit links between related entities that should not be merged maintain separate profiles while capturing relationships such as the same person appearing across different time periods, contexts, or name variations.
- Conflict Detection: Contradictory data between potential duplicates is automatically identified with severity classification, enabling investigators to assess whether conflicts indicate distinct individuals or data quality issues.
- Batch Deduplication: Large-scale deduplication jobs process entity populations efficiently with progress tracking, configurable auto-merge thresholds, and manual review queues for data cleansing initiatives.
- Continuous Model Improvement: Match decisions feed back into model training, and reviewer accuracy tracking drives ongoing improvement in match quality with periodic model retraining.
Use Cases#
- KYC Deduplication: Financial institutions identify and consolidate duplicate customer profiles across onboarding channels, ensuring a single comprehensive view of each customer for regulatory compliance.
- Investigation Entity Matching: Investigators match subjects against existing entity databases to identify prior investigations, known associates, and historical risk indicators before beginning new case work.
- Data Quality Management: Data stewards run periodic deduplication jobs to maintain entity database integrity, reduce storage costs, and improve search and screening accuracy.
- Cross-Source Entity Consolidation: Profiles ingested from multiple data sources are automatically matched and consolidated, creating unified golden records with complete attribute coverage from all contributing sources.
- Synthetic Identity Detection: Pattern analysis across identity attributes identifies potential synthetic identities constructed from combinations of real and fabricated data points.
- Manual Review and Quality Assurance: Compliance teams review ambiguous match candidates through guided interfaces, with reviewer performance tracking ensuring consistent decision quality.
Integration#
The Profile Entity Resolution module integrates with the platform's profile management, investigation management, and risk scoring systems. Match results feed into entity profiles and investigation workspaces, golden records synchronise across all downstream systems, and deduplication metrics inform data quality dashboards. The module supports integration with external identity verification services and connects to audit trail systems for complete tracking of all merge, link, and split operations.