AI Vector Search

Overview#

An analyst searching for "case files involving shell companies in offshore jurisdictions" will not find them with keyword search unless every relevant document contains exactly those words. They will find them with semantic search. The AI Vector Search platform delivers high-performance semantic search across millions of vectors, matching queries based on meaning rather than exact text, and combining that semantic signal with keyword matching, metadata filtering, and machine learning reranking to produce search results that actually reflect what the user was looking for.

The result is content discovery that feels intelligent rather than mechanical, whether the task is finding investigation precedents, retrieving relevant policies, or surfacing products matching a natural language description.

Key Features#

High-Performance Vector Search: Executes approximate nearest neighbour queries across millions of vectors in milliseconds using optimised index structures that balance search accuracy with query speed.
Hybrid Search: Combines dense vector similarity (semantic understanding) with sparse keyword matching (exact terms) and structured metadata filtering for maximum search relevance.
Multi-Stage Ranking Pipeline: Applies vector search, keyword boosting, metadata filtering, and ML-powered reranking in sequence to deliver highly relevant results without sacrificing performance.
Advanced Metadata Filtering: Supports multiple filter types including numeric ranges, categorical values, date ranges, text matching, array containment, and geospatial queries.
Temporal Boosting: Exponential decay functions prioritise recent content for time-sensitive searches while maintaining access to historical information.
ML Reranking: Cross-encoder transformer models evaluate query-document relevance for top candidates, providing significant relevance improvement for complex queries.
Personalised Ranking: User history and preferences influence result ordering, improving relevance based on individual usage patterns.
Horizontal Scaling: Distributed indexing and sharding support growth from thousands to billions of vectors with consistent performance.
Multiple Index Algorithms: Supports various approximate nearest neighbour algorithms optimised for different dataset sizes, memory constraints, and update frequency requirements.
Query Understanding: Detects query intent to automatically select the most effective search strategy, whether semantic, keyword, or hybrid.

Use Cases#

Investigation and Research: Discovers relevant case files, intelligence reports, and evidence across large document collections using conceptual queries that match meaning rather than exact terms. Law enforcement agencies and intelligence organisations use this to surface investigation precedents and cross-reference evidence across thousands of documents.
Enterprise Knowledge Search: Enables employees to find relevant documents, policies, and procedures using natural language questions, cutting time-to-answer compared to keyword-based search systems.
Content Recommendation: Suggests related documents, articles, and resources based on semantic similarity to currently viewed content, increasing engagement and knowledge discovery within investigation and analysis workflows.
E-Commerce Product Discovery: Transforms product search from rigid keyword matching to natural language understanding, enabling shoppers to find products by describing what they need.

Integration#

The platform integrates with embedding generation pipelines, document processing systems, and AI applications through flexible APIs. It supports real-time index updates as new content is added, as well as batch indexing for initial corpus migration.

Open Standards#

OAuth 2.0 and JWT Bearer Token: Token-based authentication protects typed, auditable read and write workflows across the platform.
Okapi BM25: The keyword retrieval leg of the hybrid pipeline uses platform record store ts_rank with BM25-style term frequency and inverse document frequency scoring against a tsvector full-text index, conforming to the Okapi BM25 probabilistic retrieval model (Robertson & Sparck Jones, 1976/1994).
Reciprocal Rank Fusion (SIGIR 2009): The fusion of dense vector and BM25 ranked lists uses the RRF algorithm (Cormack, Clarke & Buettcher, 2009, DOI: 10.1145/1571941.1572114) with the standard smoothing constant k=60 and a configurable alpha weight.
SQL Full-Text Search (ISO/IEC 9075, SQL Standard): Document indexing and keyword retrieval use ISO SQL full-text functions (to_tsvector, plainto_tsquery) backed by platform record store's implementation of the SQL standard's text search capabilities.
OAuth 2.0 (RFC 6749) / OpenID Connect: All search integration endpoints require a verified bearer token issued via the platform's OAuth 2.0 / OIDC authorisation flow; tenant isolation is enforced on every query using the organisation scope claim carried in the JWT (RFC 7519).
JSON (RFC 8259): Document metadata is stored, filtered, and returned as RFC 8259 JSON; the typed integration layer exposes a JSON scalar for arbitrary structured metadata payloads attached to indexed documents and search results.

Last Reviewed: 2026-02-05 Last Updated: 2026-04-14