Overview#
A forensic examiner working a financial fraud case receives a collection of hours-long interview recordings and hundreds of scanned document images. The original files are preserved untouched, but the examiner needs searchable transcripts to find specific admissions, reduced-resolution previews to share with legal counsel, and OCR-extracted text to run keyword searches across all documents at once. Generating each of those manually would consume days of analyst time. The Evidence Derivatives Management module generates them automatically, in the background, the moment source evidence is ingested.
Every derivative is linked back to its master evidence file with complete provenance tracking. The original is never modified. Digital forensics labs, prosecutorial offices, and disclosure teams all rely on derivatives to make evidence accessible for review, presentation, and production without touching the forensic copy. The system handles 200+ file types through GPU-accelerated processing pipelines and maintains the chain of custody for every output it produces.
Key Features#
- Automated derivative generation covering 12 output types: thumbnails, previews, video transcodes, audio extracts, transcripts, OCR text, mobile formats, and analysis outputs
- GPU-accelerated processing for video transcoding and image operations, keeping wait times short even on large multimedia collections
- Format detection for 200+ file types with appropriate processing handlers selected automatically for each
- Quality presets with fine-tuned control over resolution, bitrate, and compression, selectable per evidence type or case requirement
- Complete provenance tracking linking every derivative to its master evidence file, with the relationship recorded in the chain of custody
- Batch processing for bulk derivative generation across entire case files, triggered at ingestion or on demand
- Mobile-friendly derivative formats for field investigator access to evidence without requiring full-quality downloads
- OCR text extraction and transcription enabling full-text search across all evidence types, including scanned documents and recorded interviews
- Error recovery with automatic retry and configurable fallback quality settings for files that present processing challenges
- Storage-optimised derivative files that provide full analytical utility without multiplying storage costs
Use Cases#
- Generating thumbnail grids and searchable previews for rapid visual evidence browsing during case review sessions
- Creating transcripts from audio and video evidence so analysts can search for specific statements or keywords without reviewing recordings in real time
- Producing mobile-optimised evidence formats so field investigators can access relevant material from any device
- Batch-generating derivatives across entire case files to prepare for large-scale review or disclosure without manual processing steps
Integration#
The Evidence Derivatives Management module connects with evidence management, chain of custody, and disclosure workflows through a distributed task processing architecture.
Open Standards#
- W3C Verifiable Credentials Data Model v2.0: Signed provenance credentials (EvidenceCollection and CustodyTransfer types) are issued as compact JWTs with DID-based issuers and linked to every derivative, recording the chain of custody in a tamper-evident, interoperable format.
- SHA-256 (NIST FIPS 180-4): A 256-bit cryptographic digest is computed and stored for every evidence file and email attachment, giving investigators a stable integrity fingerprint that can be independently verified throughout the derivative lifecycle.
- IANA Media Types (MIME): All 200+ supported file types are identified by their registered MIME type, which drives routing to the correct derivative-generation handler (OCR, video transcode, audio extraction, thumbnail, etc.).
- PDF / ISO 32000 and PDF/A-3 / ISO 19005-3: PDF evidence files are processed for text extraction and OCR; PDF/A-3 is the archival output format used when packaging derivatives into disclosure bundles, embedding associated attachments as file specifications.
- MPEG-4 Part 10 / H.264 and HEVC / H.265 (ISO/IEC 14496, ISO/IEC 23008): Video derivative generation targets MP4 with H.264 or HEVC codecs, the dominant interoperable formats for courtroom playback and disclosure production.
- RFC 2822 Internet Message Format:
.emlemail evidence files are parsed as standard RFC 2822 MIME messages, with recursive multipart extraction of body parts and attachments feeding the derivative and OCR pipelines. - ISO 4217 Currency Codes: Three-character currency codes are enforced on all financial-transaction records extracted from evidence documents, ensuring extracted monetary data is unambiguous across jurisdictions.
- GraphQL (June 2018 Specification): All evidence derivative queries, OCR results, provenance lookups, and chain-of-custody mutations are exposed through a typed GraphQL API, enabling structured interoperability with downstream review and disclosure tools.
Last Reviewed: 2026-02-23 Last Updated: 2026-04-14