Overview#
First-pass retrieval systems rank documents by estimated relevance, but the ranking is produced independently for each document rather than by directly comparing the query against the document as a pair. Cross-encoders evaluate the query and document jointly, allowing every token in the query to attend to every token in the document, which produces substantially more accurate relevance scores at the cost of higher compute per candidate. Applying cross-encoder reranking as a second pass over a short candidate list from hybrid retrieval combines the recall of fast first-pass methods with the precision of joint query-document scoring.
The Cross-Encoder Reranking module uses the cross-encoder/ms-marco-MiniLM-L-6-v2 model (Apache 2.0 license) to rescore candidate documents against a query. Candidates are submitted as (query, document) pairs; the model produces a relevance score for each pair; the pairs are sorted by descending score and the top-k are returned. The model is loaded lazily from R2 via the the model registry and is not bundled at import time. If the model is unavailable, the module returns the input list in original order without raising an error. All reranking operations are scoped to organization_id for EDF/PESCO compliance.
Last Reviewed: 2026-04-14 Last Updated: 2026-04-14
Key Features#
-
Joint Query-Document Scoring: The cross-encoder evaluates each (query, document) pair together rather than scoring query and document independently. This allows the model to capture query-document interactions that bi-encoder cosine similarity misses, particularly for long or complex queries.
-
Lazy Model Loading: The cross-encoder/ms-marco-MiniLM-L-6-v2 model is loaded from R2 via the model registry on first use. It is not loaded at import time or application startup, so deployments without the model incur no startup overhead and no import error.
-
Batch Prediction: All candidate pairs in a reranking request are scored in a single batched batch model inference call with a batch size of 32. This amortises per-call overhead across the candidate list.
-
Graceful Fallback: If the model is unavailable due to missing dependencies, a failed R2 fetch, or any runtime exception, the module returns the candidates in their original input order. The caller receives a modelAvailable flag in the response payload to indicate whether cross-encoder scoring was applied.
-
Organization Scoping: The org_id parameter is required on every rerank call. It is included in audit log entries and in the document fetch queries used to retrieve candidate content from the search document store, ensuring no cross-tenant document content is read.
-
GraphQL Query Surface: The rerankedSearch query accepts query, documentIds, and topK arguments and returns a RerankedSearchPayload containing results with id, score, documentType, title, rerankScore fields and a modelAvailable boolean.
Use Cases#
- Evidence Precision: After hybrid retrieval returns 40 candidate evidence items, cross-encoder reranking promotes the items with the strongest semantic alignment to the investigator's query to the top of the list, reducing review time.
- Intelligence Report Ranking: Surface the most relevant intelligence reports from a candidate set where term overlap alone does not reliably indicate relevance.
- Legal Document Review: Rerank candidate documents in a disclosure set to prioritise those most relevant to a specific legal question before presenting them to the reviewing attorney.
- Incident Triage: Given a set of candidate alerts or reports surfaced by a first-pass retrieval, rerank by relevance to the specific incident description to guide analyst attention.
Integration#
- Hybrid Search Retrieval: The primary upstream source of candidates. Hybrid search supplies a fused ranked list; cross-encoder reranking provides the precision second pass.
- Model Registry: The the model registry abstraction handles R2 download, local caching, and model lifecycle. the reranking component calls the model registry retrieval interface rather than loading files directly.
- Search Index Domain: Document content is fetched from the search document store for pair construction, scoped to organization_id.
Open Standards#
- GraphQL Specification (October 2021): The reranking query surface is exposed via a GraphQL API, following the GraphQL specification published by the GraphQL Foundation.
- W3C PROV-O (PROV Ontology, 2013): Audit log entries recording reranking operations follow the W3C provenance data model to support traceable, organisation-scoped activity records.
- RFC 8259 (JSON Data Interchange Format): All API request and response payloads, including reranked result sets and the modelAvailable flag, are serialised as JSON per this IETF standard.
- TREC Common Evaluation Framework: The MS MARCO passage ranking benchmark, used to train and evaluate the cross-encoder model, is derived from the TREC evaluation methodology for information retrieval systems.