Overview#
Body camera footage, intercepted audio recordings, and photographs seized from suspects all carry intelligence that text-based tools cannot reach. The Multimodal Analysis module processes images, audio, and video directly through AI models, returning OCR output, transcripts, object detections, and forensic findings ready to attach to an investigation case.
Key Features#
- Image Analysis: AI-powered object detection, scene understanding, OCR text extraction, and visual content classification
- Audio Transcription: Automated speech-to-text transcription with speaker identification and language detection
- Video Analysis: Frame-by-frame video analysis combining visual and audio processing for comprehensive content understanding
- Forensic Analysis: Specialised analysis capabilities for investigative use cases including evidence examination
- Native Multimodal Processing: Direct processing of images, audio, and video without separate preprocessing steps
- High-Accuracy Analysis: Advanced AI models deliver results with confidence scoring and usage tracking
Use Cases#
Relevant sectors include law enforcement, defence intelligence, and financial crime investigation.
- Extracting text from images and documents during evidence processing
- Transcribing audio recordings for investigation documentation
- Analysing video footage for object identification and scene understanding
- Processing multimedia evidence across investigation workflows
Integration#
- Connects with media storage for source file access
- Integrates with document analysis for text-based content
- Works with content summarisation for AI-generated summaries of analysed media
Open Standards#
- IANA Media Types (RFC 2046): The service identifies all ingested media by MIME type (image/jpeg, image/png, image/webp, audio/mpeg, etc.) and passes the detected type to the AI provider, following the IANA media-type registry.
- Base64 (RFC 4648): All image, audio, and video payloads are transferred as Base64-encoded strings across the GraphQL API boundary, with the service decoding them before processing.
- GraphQL (June 2018 specification): The entire multimodal API surface, queries for analysis results and mutations for image, audio, and video analysis, is exposed as a typed GraphQL schema using the Strawberry framework.
- JPEG (ISO/IEC 10918) / PNG (ISO/IEC 15948) / WebP / BMP: These raster image formats are detected by magic-byte inspection and submitted to computer-vision models; EXIF metadata embedded in JPEG/TIFF files is explicitly targeted during forensic analysis tasks.
- MPEG-4 / ISO/IEC 14496 (MP4, MKV, WebM, AVI, MOV): Video container formats are detected and decoded frame-by-frame via OpenCV for frame sampling, OCR, and AI analysis.
- WAVE / RIFF PCM audio: Uncompressed PCM audio (WAV) is detected by RIFF magic bytes; the audio classification service processes raw 16-bit mono PCM buffers directly, and MP3 (MPEG-1 Audio Layer III), FLAC, and Ogg Vorbis files are also identified and dispatched for transcription.
- ISO 639 language codes: Audio transcription accepts a language parameter using ISO 639-1 two-letter codes (defaulting to "en"), which is forwarded to the speech-to-text API for language-specific decoding.
- RFC 4122 (UUID): All persistent analysis records, batch run identifiers, user references, and organisation references are keyed by RFC 4122 version-4 UUIDs throughout the data model and API.
Last Reviewed: 2026-02-05 Last Updated: 2026-04-14