Profile Search and Discovery

Overview#

An intelligence analyst receives a tip about a person of interest. The name they have is a rough transliteration from Arabic script: "Khalid Al-Farouk", though it could equally be spelled "Khaled", "Al-Farouq", or several other variants. They also have an approximate date of birth and a city. Searching for an exact name match will miss most of the relevant profiles. What they need is a search engine that understands phonetics, handles name variants across languages and scripts, and can narrow results by demographic and geographic criteria.

The Profile Search and Discovery module provides exactly that kind of search: multi-modal, fuzzy-tolerant, phonetically aware, and fast enough to support live investigative workflows. It also supports saved searches with scheduled execution for ongoing monitoring, so analysts do not have to re-run the same queries manually.

Open Standards#

OAuth 2.0 and JWT Bearer Token: Token-based authentication protects typed, auditable read and write workflows across the platform.
Privacy-Preserving Record Linkage (PPRL) via Bloom filter CLK encoding (Schnell et al., BMC Medical Informatics 2009): Fuzzy entity similarity scoring uses bigram-decomposed Bloom filter encoding with Dice coefficient comparison, allowing phonetic-safe matching without retaining plaintext matching criteria.
Okapi BM25 probabilistic relevance model: platform record store full-text search is scored using ts_rank with BM25-style term-frequency weighting as the keyword retrieval leg of the hybrid search pipeline.
Reciprocal Rank Fusion (RRF, Cormack et al., SIGIR 2009): BM25 keyword results and dense vector (semantic) results are merged into a single ranked list using the RRF formula with a smoothing constant of 60, as published in the original paper.
JSON Web Token (JWT, RFC 7519): Every search and profile query is authenticated via JWT Bearer tokens; the organisation scope claim is used to enforce per-tenant data isolation on all result sets.
Unicode Normalisation Form KD (NFKD, Unicode Standard / ISO/IEC 10646): Person name strings are NFKD-normalised before matching to strip diacritics and compatibility characters, enabling accurate cross-script and transliteration comparison.
ISO 3166-1 alpha-2: Two-letter country codes are used as the canonical identifier for geographic proximity filtering, country-based facets, and country-restriction access controls on profile results.

Last Reviewed: 2026-02-05 Last Updated: 2026-04-14

Key Features#

Natural Language Search: An intelligent search engine interprets natural language queries, understanding search intent and automatically applying appropriate matching strategies including simple keyword, structured, phrase, and proximity search modes with configurable relevance scoring.
Fuzzy Matching and Typo Tolerance: Configurable edit distance algorithms find entities despite typos, misspellings, and name variations, with support for character transpositions, prefix matching, and automatic fuzziness adjustment based on query term length.
Phonetic Name Search: Multiple phonetic algorithms including Soundex, Metaphone, Double Metaphone, NYSIIS, and Caverphone match entities by name pronunciation, with configurable algorithm weighting and minimum score thresholds for accurate cross-language and transliteration matching.
Advanced Name Matching: Comprehensive name matching handles honorific removal, suffix normalisation, initial expansion, accent normalisation, component-level scoring with configurable first/middle/last name weights, and name reordering to match names across different cultural naming conventions.
Faceted Search and Filtering: Multi-dimensional filtering across demographic, geographic, temporal, risk, relationship, identifier, and activity categories enables investigators to narrow results through faceted navigation with real-time count updates and predefined filter templates for common investigative scenarios.
Geographic Proximity Search: Location-based search finds entities within specified distances using geographic coordinates, supporting distance-based sorting, bounding box queries, and country or region filtering for spatially-oriented investigations.
Identifier-Based Lookup: Direct search by passport number, national identifier, tax identifier, or other document numbers with format normalisation, checksum validation, and fuzzy numeric matching that tolerates digit transpositions and omissions.
Saved Searches and Scheduling: Reusable search templates with saved query parameters, filter configurations, and delivery preferences can be scheduled for automated recurring execution, enabling ongoing monitoring and alerting for new matching entities.
Search Suggestions and Autocomplete: Real-time query suggestions, autocomplete, and "did you mean" corrections guide users toward effective searches with context-aware recommendations based on field frequency and entity type distributions.

Use Cases#

Investigation Subject Identification: Investigators search for persons and organisations using partial names, known identifiers, or approximate details, with fuzzy and phonetic matching uncovering potential matches despite incomplete or misspelled information.
Customer Due Diligence Screening: Compliance teams search entity databases during onboarding to identify existing profiles, prior investigation involvement, and potential duplicate records before creating new customer profiles.
Cross-Reference Discovery: Analysts use identifier-based search to cross-reference entities across data sources, finding matching passport numbers, tax identifiers, or registration numbers that link seemingly unrelated profiles.
Geographic Pattern Analysis: Investigators search for entities concentrated in specific geographic areas to identify potential networks, co-located businesses, or address-sharing patterns relevant to financial crime investigations.
Ongoing Monitoring Alerts: Saved searches with scheduled execution automatically detect new entities matching investigation criteria, alerting analysts when persons or organisations of interest appear in the system.
Data Quality Assessment: Search capabilities enable data stewards to identify potential duplicate profiles, incomplete records, and data inconsistencies through targeted queries and faceted analysis of entity populations.

Integration#

The Profile Search and Discovery module integrates with the platform's profile management, investigation management, and entity resolution systems. Search results feed into investigation workspaces and due diligence workflows, saved searches connect to alerting and notification systems, and search analytics inform data quality dashboards. The module supports field-level access controls and audit logging for complete tracking of all search activities.