Surveillance Natural-Language Frame Search

Overview#

Type a plain description of what you are looking for, such as "person in a red jacket near gate 3", and retrieve the exact recorded camera frames that match it, ranked and timestamped, in milliseconds.

Surveillance Natural-Language Frame Search turns an archive of recorded camera footage into a searchable corpus that responds to meaning rather than file names or timestamps alone. Instead of scrubbing through hours of recordings, an operator describes a scene in everyday language and immediately receives the most relevant frames, each one tied to a specific camera, coverage area, and source recording. This collapses post-incident investigation from hours of manual review into a single query, and it surfaces contextual matches that keyword tags or time-range filters could never find on their own.

The capability pairs a multilingual dense-vector embedding model with a tenant-scoped vector index, then enriches every match against stored frame metadata so results arrive complete with camera serial numbers, coverage-area labels, detection summaries, thumbnails, and the originating recording reference. Every match is filtered by the operator's access rights before it ever leaves the service.

Key Features#

Natural-Language Querying: Operators search recorded frames using free-text descriptions of a scene, person, object, or behaviour, with no need to remember tags, file names, or exact times.
Multilingual Dense-Vector Matching: Queries are embedded with the BAAI BGE-M3 multilingual model, so a description written in one of many supported languages matches relevant frames regardless of how the footage was originally labelled.
Cosine-Similarity Ranking: Every result is scored by cosine similarity between the query vector and the frame vector, and results are returned in ranked order with a configurable minimum-score threshold to suppress weak matches.
Rich Result Enrichment: Each ranked frame is returned with its timestamp, frame number, thumbnail URL, detection summary, camera serial number, coverage-area label, and the originating recording reference, giving an operator full context at a glance.
Camera-Scoped Filtering: Searches can be narrowed to a specific set of cameras, letting investigators focus on the entrances, zones, or devices relevant to an incident.
Index Statistics: A companion read query reports aggregate index health, including total embeddings, number of cameras indexed, and the earliest and latest indexed frame timestamps, so teams know exactly what coverage their archive holds.
Separated Ingestion Pipeline: Frame metadata is written by an independent embedding worker, keeping high-volume ingestion isolated from low-latency search so neither degrades the other.
Per-Organisation Feature Gating: The capability is enabled per organisation behind both a surveillance flag and a dedicated frame-search flag, so it can be rolled out selectively and switched off cleanly.

Use Cases#

Transport and Transit Authorities: An operator reviewing a theft can query "person leaving a bag near the ticket barriers at 10pm" and receive ranked, timestamped frames from the relevant platform cameras within milliseconds, rather than manually scanning each recording.
Critical Infrastructure and Facilities Security: Site security teams locate a described individual, vehicle, or item of interest across many cameras at once, then jump straight to the source recording for the surrounding context.
Investigations and Post-Incident Review: Investigators reconstruct a timeline by issuing successive descriptive queries, assembling a sequence of contextual frames across cameras and coverage areas without trawling raw footage.
Intelligence and Analysis Teams: Analysts retrieve corroborating visual evidence by describing scenes in natural language, surfacing frames that traditional keyword or time-range filters would miss.
Retail and Venue Loss Prevention: Loss-prevention staff describe a suspicious behaviour or appearance and immediately review the matching frames across entrances and aisles.

Integration#

The capability is exposed through the platform's typed integration layer. A single search query accepts a free-text description, an optional list of camera identifiers, a result count, and a minimum-similarity threshold, and returns a ranked list of enriched frame results together with the elapsed query time. A separate read query returns aggregate index statistics for the organisation. Ingestion is handled by a companion embedding worker that persists frame metadata, so customers building their own capture pipelines have a clean seam to write into.

Authentication and tenant isolation are built in. Every request is authenticated, scoped to the caller's organisation, and filtered by the operator's access rights before results are assembled, so an organisation only ever sees its own footage. Thumbnail URLs and source recording references are returned inline, making it straightforward to wire results into an existing case-management surface, a review console, or a downstream evidence workflow. Because search runs over a managed vector index and enriches from a relational metadata store, customers plug in their recorded frames once and gain descriptive search across the entire archive.

Open Standards#

OAuth 2.0 and JWT Bearer Token: Token-based authentication protects typed, auditable read and write workflows across the platform.
Cloudflare Vectorize: Vector storage and nearest-neighbour retrieval use Cloudflare's published Vectorize vector database, accessed over its standard REST interface with no proprietary orchestration layer.
Cloudflare Workers AI (BAAI BGE-M3): Query text is embedded with the open-weights BAAI BGE-M3 multilingual dense-vector model served on Cloudflare Workers AI, so embeddings are produced with a published, openly documented model.
Cosine Similarity: Match ranking uses the standard cosine-similarity metric, the cosine of the angle between the query and frame vectors, ranging from 0 (unrelated) to 1 (identical direction).
HTTP and REST (RFC 9110): Embedding and vector-index operations are performed over standard HTTP REST calls with bearer-token authorisation, conforming to the HTTP semantics defined in RFC 9110.
JSON (RFC 8259): Request and response payloads to the embedding and vector services are encoded as RFC 8259 JSON, and frame metadata is carried as JSON throughout.
NDJSON (Newline-Delimited JSON): Bulk vector ingestion into the index is serialised as newline-delimited JSON, one record per line, the format the vector index accepts for batched writes.
OAuth 2.0 (RFC 6749) and JWT (RFC 7519): Every search request is authenticated with a bearer token issued through the platform's OAuth 2.0 flow, and tenant isolation is enforced from the organisation claim carried in the RFC 7519 JSON Web Token.
SQL (ISO/IEC 9075): Frame metadata, camera serial numbers, and coverage-area labels are stored and joined in a relational store using ISO standard SQL, the source of the contextual fields returned with each match.

Security & Compliance#

Tenant Isolation: Every search and statistics request is scoped to the caller's organisation, and the vector index query, the access-rights filter, and the relational metadata lookup each enforce that scope independently, so footage from one organisation is structurally inaccessible to another.
Role-Based Access Control: Results are filtered against the operator's access rights before enrichment, ensuring an operator only retrieves frames they are entitled to view.
Authenticated Access Only: Both the search query and the index-statistics query require an authenticated, authorised caller; unauthenticated requests are rejected.
Controlled Rollout: The capability is gated per organisation behind both a surveillance flag and a dedicated frame-search flag, so it is only active where it has been explicitly enabled.
Auditable Querying: Each completed search is logged with its result count and elapsed time, supporting operational monitoring and after-the-fact review without exposing footage content in logs.

Last Reviewed: 2026-05-26 Last Updated: 2026-05-26