RAG (Retrieval-Augmented Generation)

Overview#

An investigator is working through hundreds of witness statements and forensic reports for a complex fraud case. Instead of reading every document, she types a plain-English question into the case search interface and receives a cited answer drawn directly from the evidence. The RAG domain makes this possible. It ingests documents, breaks them into semantically meaningful chunks, and combines vector similarity search with keyword matching to find the most relevant passages. An LLM then synthesises those passages into a grounded answer, complete with citations linking back to the exact source pages. Hallucination detection checks that each claim in the response is traceable to the retrieved evidence.

Key Features#

Document ingestion with semantic chunking, token counting, and metadata preservation
Hybrid search combining vector similarity and keyword matching with result fusion
LLM-powered question answering with context building and optimised prompts
Citation extraction linking answers to specific source documents, pages, and excerpts
Hallucination detection with grounding verification for answer factuality
User feedback collection with thumbs up/down ratings for quality improvement
Case-scoped search to focus queries on specific investigation evidence
Re-ranking with LLM-based relevance scoring for improved result accuracy
Response caching for repeated queries with configurable cache settings
Support for multiple document types including witness statements, reports, transcripts, and forensic results

Use Cases#

Retrieval-augmented question answering is valuable in any field where analysts must work through large volumes of documents quickly. Relevant industries include law enforcement, legal services, and financial intelligence.

Querying case evidence in natural language to find relevant information with cited sources
Ingesting investigation documents for semantic search and AI-powered analysis
Verifying answer grounding to ensure AI responses are factually supported by evidence
Collecting analyst feedback to improve search relevance and answer quality over time

Integration#

The RAG domain connects with language model operations, evidence management, case management, analytical tools, and search infrastructure.

Open Standards#

OAuth 2.0 and JWT Bearer Token: Token-based authentication protects typed, auditable read and write workflows across the platform.
JSON / NDJSON (RFC 8259 / IETF): document chunk metadata, query filters, and integration responses are encoded as JSON; vector upsert payloads to the Cloudflare Vectorize REST interface use Newline-Delimited JSON (NDJSON) with application/x-ndjson content type.
OAuth 2.0 Bearer Token (RFC 6750): all outbound calls to the Cloudflare Vectorize and Workers AI APIs present credentials as the Bearer authorisation header tokens, conforming to the OAuth 2.0 token usage specification.
ISO/IEC 9075 SQL, Full-Text Search: the keyword retrieval leg uses platform record store's to_tsvector / to_tsquery / ts_rank functions, which implement the SQL standard's built-in text-search extensions; this provides the BM25-style scoring component of the hybrid search.
UUID (RFC 4122): every document chunk, vectorise index record, and feedback entry is assigned a version-4 universally unique identifier, ensuring collision-free cross-tenant key spaces.
ISO 8601 / RFC 3339 datetime: ingestion timestamps, cache expiry values, and answer feedback creation times are stored and serialised using ISO 8601 extended format (Python .isoformat()), ensuring unambiguous datetime interchange.

Last Reviewed: 2026-02-05 Last Updated: 2026-04-14