LLM Upload

Overview#

Field teams routinely seize hard drives, printed ledgers, and foreign-language documents that need rapid analysis. The LLM Upload module processes those materials through AI language model pipelines, extracting structured intelligence from unstructured files and feeding the results directly into investigation workflows.

Key Features#

Document upload with support for multiple file formats
AI-powered document analysis and content extraction
Automated metadata generation and classification
Integration with investigation and case management workflows
Bulk upload capabilities for large document sets

Use Cases#

Relevant sectors include law enforcement, financial crime, and intelligence agencies.

Uploading documents for AI-assisted analysis and summarisation
Extracting structured data from unstructured document collections
Enriching case files with AI-generated insights from uploaded materials
Processing large volumes of documents for intelligence extraction

Integration#

Integrates with AI intelligence, case management, and evidence management modules for seamless document processing workflows.

Open Standards#

OAuth 2.0 and JWT Bearer Token: Token-based authentication protects typed, auditable read and write workflows across the platform.
IANA Media Types (RFC 2046 / MIME): Uploaded files are classified and validated by MIME type using both client-declared values and magic-byte sniffing; MIME types gate the text extraction pipeline and determine OCR versus structured-parse paths.
JSON Web Token (RFC 7519) with RS256 (RFC 7518): Every LLM upload and chat request is gated by authenticated permission check, which verifies an Argus-issued RS256 JWT against a JWKS endpoint before any document is processed.
ISO 32000 (PDF): PDF documents are accepted for upload and processed through text extraction via Apache Tika or pypdf, with polyglot-detection checks applied against the PDF magic-byte header.
ECMA-376 / ISO/IEC 29500 (OOXML): DOCX and XLSX files are extracted natively by parsing the underlying Open Packaging Convention ZIP archive and reading word/document.xml and xl/sharedStrings.xml respectively.
FIPS 180-4 (SHA-256): A SHA-256 hex digest is computed over each uploaded file's raw bytes to provide a tamper-evident chain-of-custody fingerprint stored with every document analysis record.
ISO 639-1: Language detection returns a two-letter ISO 639-1 language code (e.g. en, fr, de) that is recorded alongside extracted entities and classification in the document intelligence result.
RFC 5646 / Unicode character encoding (UTF-8, UTF-16, Latin-1): Text decoded from uploaded documents is attempted sequentially across UTF-8, UTF-16, and Latin-1 encodings in accordance with IETF language-tag and Unicode standards, ensuring correct handling of multilingual seized materials.

Last Reviewed: 2026-02-24 Last Updated: 2026-04-14