Documentacao renderizada
Esta pagina renderiza o Markdown e Mermaid do modulo diretamente da fonte publica de documentacao.
Overview#
The Entity Resolution domain identifies and merges duplicate records across multiple data sources into canonical, unique entities. Using a combination of machine learning embeddings and classical matching algorithms, it detects potential duplicates, scores their similarity, and provides flexible merge strategies with full audit trail and reversibility, ensuring organizations maintain a clean, deduplicated master data set.
Key Features#
- Duplicate Detection -- Combines machine learning embeddings with classical string matching algorithms to identify potential duplicate records across data sources
- Multi-Algorithm Similarity Scoring -- Calculates similarity using vector analysis, fuzzy string matching, phonetic matching, and set-based overlap methods for robust results
- Configurable Match Thresholds -- Adjustable confidence thresholds (default 0.75) control the sensitivity of duplicate detection to match different data quality levels
- Multiple Merge Strategies -- Supports keep-newer, keep-older, combine-all, and manual field-by-field merge strategies for flexible deduplication
- Human-in-the-Loop Review -- Match candidates can be routed for human review with pending, confirmed, and rejected status workflows
- Auto-Merge for High Confidence -- Matches exceeding a high confidence threshold can be automatically merged without manual review
- Full Merge History -- Complete audit trail tracks every merge operation including source entities, target entity, strategy used, and timestamp
- Reversible Merges -- Merged entities can be split back to their original records when merges are determined to be incorrect
- Multi-Entity Type Support -- Resolves duplicates across persons, organizations, locations, assets, and events
- Attribute-Level Scoring -- Individual similarity scores for names, emails, phone numbers, and addresses provide transparency into match decisions
- Source System Mapping -- Maintains bidirectional links between canonical entities and their original source records for data lineage
- Resolution Statistics -- Dashboard metrics track total entities, potential duplicates, confirmed merges, rejection rates, and average confidence scores
Use Cases#
- Organizations importing data from multiple source systems use entity resolution to identify and merge duplicate person records, creating a single canonical record with a complete view of each individual.
- Data stewards review pending match candidates using the human-in-the-loop workflow, confirming true duplicates and rejecting false matches to continuously improve data quality.
- Automated deduplication workflows merge high-confidence matches automatically while routing borderline cases for human review, balancing efficiency with accuracy.
- Investigators trace entity records back to their original source systems using the source mapping capability, understanding the provenance and completeness of each data point.
- Data quality teams monitor resolution statistics to track deduplication progress, identify data quality issues, and measure the effectiveness of matching algorithms.
Integration#
The Entity Resolution domain works with CRM systems, HR platforms, marketing tools, and external data sources to import and normalize records. It connects to the master data management layer to maintain the golden record and provides bidirectional source mapping for data lineage across all integrated systems.
Last Reviewed: 2026-02-05