[Developers]

LLM Upload

Field teams routinely seize hard drives, printed ledgers, and foreign-language documents that need rapid analysis. The LLM Upload module processes those materials through AI language model pipelines, extracting structure

Category: Api DomainsLast Updated: Feb 24, 2026
api-domainsai

Overview#

Field teams routinely seize hard drives, printed ledgers, and foreign-language documents that need rapid analysis. The LLM Upload module processes those materials through AI language model pipelines, extracting structured intelligence from unstructured files and feeding the results directly into investigation workflows.

Key Features#

  • Document upload with support for multiple file formats
  • AI-powered document analysis and content extraction
  • Automated metadata generation and classification
  • Integration with investigation and case management workflows
  • Bulk upload capabilities for large document sets

Use Cases#

Relevant sectors include law enforcement, financial crime, and intelligence agencies.

  • Uploading documents for AI-assisted analysis and summarisation
  • Extracting structured data from unstructured document collections
  • Enriching case files with AI-generated insights from uploaded materials
  • Processing large volumes of documents for intelligence extraction

Integration#

Integrates with AI intelligence, case management, and evidence management modules for seamless document processing workflows.

Open Standards#

  • GraphQL (June 2018 specification): All LLM chat and provider management operations are exposed as typed GraphQL queries and mutations, enabling structured, self-documenting API access for investigation workflows.
  • IANA Media Types (RFC 2046 / MIME): Uploaded files are classified and validated by MIME type using both client-declared values and magic-byte sniffing; MIME types gate the text extraction pipeline and determine OCR versus structured-parse paths.
  • JSON Web Token (RFC 7519) with RS256 (RFC 7518): Every LLM upload and chat request is gated by IsAuthenticated, which verifies an Argus-issued RS256 JWT against a JWKS endpoint before any document is processed.
  • ISO 32000 (PDF): PDF documents are accepted for upload and processed through text extraction via Apache Tika or pypdf, with polyglot-detection checks applied against the PDF magic-byte header.
  • ECMA-376 / ISO/IEC 29500 (OOXML): DOCX and XLSX files are extracted natively by parsing the underlying Open Packaging Convention ZIP archive and reading word/document.xml and xl/sharedStrings.xml respectively.
  • FIPS 180-4 (SHA-256): A SHA-256 hex digest is computed over each uploaded file's raw bytes to provide a tamper-evident chain-of-custody fingerprint stored with every document analysis record.
  • ISO 639-1: Language detection returns a two-letter ISO 639-1 language code (e.g. en, fr, de) that is recorded alongside extracted entities and classification in the document intelligence result.
  • RFC 5646 / Unicode character encoding (UTF-8, UTF-16, Latin-1): Text decoded from uploaded documents is attempted sequentially across UTF-8, UTF-16, and Latin-1 encodings in accordance with IETF language-tag and Unicode standards, ensuring correct handling of multilingual seized materials.

Last Reviewed: 2026-02-24 Last Updated: 2026-04-14

Ready to Build?

Get started with our APIs or contact our integration team for support.