Overview#
In a mature financial crime program, the most valuable resource is not the next alert: it is the work already done. A structuring case closed eight months ago might contain the exact beneficial ownership chain that a new sanctions investigation needs. An entity profile built for a fraud ring inquiry might reveal a shared director linking it to the current case. That institutional knowledge is only accessible if the search tools are good enough to surface it. The Search and Discovery module makes it accessible, combining full-text search, semantic understanding, and ML-powered discovery across the entire investigation repository.
Built for compliance analysts, fraud investigators, and financial intelligence units, the module delivers sub-second results across millions of case records, evidence items, notes, and entities, with faceted filtering, fuzzy matching, and recommendation algorithms that surface relevant material even when the analyst does not know exactly what to search for.
Key Features#
- Full-Text Search: All investigation content, evidence, notes, and OCR text are indexed and searchable with support for exact phrases, Boolean operators, and field-specific queries.
- Semantic Search: Natural language query interpretation understands investigator intent beyond keyword matching, with synonym expansion, entity recognition, and intent detection for more relevant results.
- Faceted Filtering: Multi-dimensional filtering across status, priority, risk score, severity, date ranges, entity types, jurisdictions, industries, transaction counts, amounts, and currencies enables precise result narrowing.
- Saved Searches and Templates: Investigators save frequently used search configurations and access pre-built search templates for common workflows such as high-risk cases, unassigned priorities, and aged cases.
- ML-Powered Discovery: Machine learning algorithms identify related investigations through content-based similarity matching, shared entity overlap, matching typologies, and network analysis connections.
- Recommendation Engine: Multiple recommendation models including content-based similarity, collaborative filtering, graph-based recommendations, and behavioural analysis surface the most relevant related investigations with confidence scores.
- Fuzzy and Phonetic Search: Typo-tolerant matching, phonetic name search, stemming, and alternative spelling support ensure investigators find results despite data entry variations.
- Advanced Pagination: Standard pagination, cursor-based navigation for deep result sets, and infinite scroll with virtual rendering support efficient navigation through large result sets.
- Search Analytics: Usage tracking and quality metrics provide insights into search patterns, engagement, and result relevance to continuously improve search effectiveness.
Use Cases#
- Investigation Research: Analysts rapidly locate relevant investigations, entities, and evidence across millions of records using keyword, semantic, and faceted search capabilities.
- Related Case Discovery: ML-powered discovery tools automatically identify investigations with similar content, shared subjects, matching typologies, or network connections to the current case.
- Recurring Workflow Optimisation: Saved search templates eliminate repetitive query construction for daily tasks such as reviewing open cases, monitoring high-risk investigations, and tracking approaching SLA deadlines.
- Cross-Investigation Pattern Detection: Discovery algorithms reveal hidden connections between separate investigations by identifying common entities, shared media coverage, and overlapping transaction patterns.
- Evidence Location: Full-text search across evidence repositories, OCR text, and investigation notes enables rapid location of specific documents, transactions, or communications.
- Resource Prioritization: Faceted search by risk score, priority, and case age helps supervisors identify cases requiring immediate attention and allocate resources effectively.
Integration#
The Investigation Search and Discovery module integrates with the platform's case management, entity resolution, evidence management, and transaction monitoring systems. Indexed data spans investigations, entities, transactions, documents, and OCR text repositories. Search results include relevance scoring, highlighted matches, and faceted aggregations, while discovery recommendations feed into cross-case linking workflows and investigation prioritization dashboards.
Open Standards#
- GraphQL (June 2018 specification): all search and discovery operations are exposed as typed GraphQL queries and mutations, with Strawberry-generated schemas covering index management, hybrid search, re-ranking, discovery jobs, and saved queries.
- Okapi BM25 / ISO SQL Full-Text Search (tsvector/tsquery): full-text retrieval uses PostgreSQL's
to_tsvector/plainto_tsquery/ts_rankfunctions, which implement BM25-style probabilistic term-frequency weighting across all indexed case content. - Reciprocal Rank Fusion (Cormack et al. 2009): the hybrid search layer merges BM25 and dense vector result lists using the RRF algorithm, providing a ranked fusion score without requiring score normalisation across disparate retrieval methods.
- Dense Vector Similarity Search (IEEE-754 floating-point embeddings): semantic search encodes queries and documents as float32 embedding vectors and retrieves nearest neighbours via Cloudflare Vectorize, enabling natural language intent matching beyond keyword overlap.
- RFC 4122 UUID: every search index, indexing job, discovery job, result, and saved query is identified by a version-4 UUID, ensuring globally unique, collision-resistant identifiers across all tenants.
- RFC 3339 / ISO 8601 date-time: all timestamps (created, updated, completed, job logs) are stored and transmitted as UTC-offset ISO 8601 strings, guaranteeing unambiguous chronological ordering across time zones.
- RFC 6455 WebSocket: discovery job lifecycle events (running, completed, failed, cancelled) are broadcast over WebSocket connections via the platform realtime manager, giving analysts live status without polling.
- RFC 7519 JSON Web Token (JWT) with RS256: all GraphQL endpoints are gated behind the
IsAuthenticatedpermission class, which validates RS256-signed JWTs against a JWKS endpoint before any index or discovery operation is permitted.
Last Reviewed: 2026-02-23 Last Updated: 2026-04-14