[Intelligence]

Open Voice Intelligence

Voice calls in emergency dispatch, command centres, and field operations carry intelligence that is unavailable in any other channel: the words spoken, the sequence of events described, the urgency of the caller, and the

Module metadata

Voice calls in emergency dispatch, command centres, and field operations carry intelligence that is unavailable in any other channel: the words spoken, the sequence of events described, the urgency of the caller, and the

Back to All Modules

Source reference

content/modules/open-voice-intelligence.md

Last Updated

Apr 14, 2026

Category

Intelligence

Content checksum

a9eb717ddd7f580d

Tags

intelligenceaicompliance

Overview#

Voice calls in emergency dispatch, command centres, and field operations carry intelligence that is unavailable in any other channel: the words spoken, the sequence of events described, the urgency of the caller, and the actions that were promised. Processing that audio into structured intelligence requires transcription, speaker attribution, and summarisation. All three have historically required proprietary acoustic analysis stacks.

The Argus Open Voice Intelligence pipeline is built entirely from open-source and open-standard components. Audio is transcribed using open-source ASR model (MIT licence), speakers are distinguished using segment-level silence-gap turn detection with no voiceprint or biometric models, and transcripts are summarised using an AI language model as a general-purpose LLM operating on plain text. The result is a fully auditable pipeline that produces structured intelligence from voice without relying on proprietary acoustic analysis methods.

Diagram

graph LR
    A[Audio Input] --> B[WebRTC VAD\nBSD Licence]
    B --> C[open-source ASR model\nMIT Licence\narXiv:2212.04356]
    C --> D[TranscriptSegment Array\nplain text + timestamps]
    D --> E[Silence-Gap Speaker Labeling\nno voiceprint model]
    E --> F[Labelled Plain Text Transcript]
    F --> G[AI Language Model Summary\nAI model provider API\nplain text input only]
    G --> H[SummaryResult\nsummary / key_events\naction_items / sentiment]
    H --> I[Structured Intelligence]
    I --> J[VoiceProcessingAuditRecord\ncomponent trail per call]

Last Reviewed: 2026-04-14 Last Updated: 2026-04-14

Key Features#

  • open-source ASR model (MIT Licence): Audio transcription uses an open-source automatic speech recognition model released under the MIT licence. This is a general-purpose ASR model trained on large-scale weakly supervised data and supports transcription across 99 languages. The open_asr_service sends audio to the ASR API and returns time-aligned transcript segments with per-segment start and end times. No acoustic keyword spotting or feature embedding is performed at this stage; the output is plain text.

  • Segment-Based Speaker Labeling: Speaker attribution uses silence-gap turn detection on the time-aligned segments produced by the open-source ASR model. When the gap between consecutive segments exceeds the configured threshold (default 1.5 seconds), the active speaker label advances to the next speaker (SPEAKER_1, SPEAKER_2, ...). No voiceprint models, speaker embeddings, d-vectors, x-vectors, or acoustic biometric analysis are used. Speaker labels are positional, not identity-based.

  • Post-Hoc LLM Summarisation: Transcript summarisation is performed by an AI language model operating on a plain-text string. The input to the LLM is the transcript text only: no audio, no acoustic features, and no structured communication ontology objects. The LLM returns a summary, a list of key events, a list of action items, and an overall sentiment classification. This is standard LLM text summarisation, not a domain-ontology-based analysis system.

  • Per-Call Processing Audit: Every call processed by the voice intelligence pipeline generates a VoiceProcessingAuditRecord in PostgreSQL. The processing_steps JSONB column records which component handled each step, its licence, and its citation. This provides a per-call evidence trail for compliance review, confirming which tools were and were not used.

  • Multi-Tenant Isolation: All voice processing audit records carry organization_id and created_by columns, satisfying EDF/PESCO multi-tenant isolation and audit trail requirements.

Open Standards and Licences#

ComponentLicenceReference
open-source ASR modelMITRadford et al. arXiv:2212.04356
WebRTC VADBSDhttps://webrtc.org/
AI language model APIProvider APIProvider documentation

The pipeline uses open-source ASR for transcription (plain text output only), silence-gap heuristics for speaker labelling (no voiceprint or biometric models), and a general-purpose AI language model for summarisation (plain text input only). All acoustic processing ends at the open-source ASR model step; all downstream processing operates on plain text.

Use Cases#

Open Voice Intelligence is used across public safety, defence operations, and critical infrastructure sectors.

  • Emergency Dispatch Transcription: PSAP call recordings are transcribed and summarised automatically, producing structured call records with key events and action items for the CAD system.
  • Command Centre After-Action: Command calls are transcribed and summarised for after-action review, with speaker labels tracking which participant made which statement.
  • Field Debrief Processing: Field team audio debriefs are converted to structured intelligence reports with extracted action items and sentiment classification.
  • Compliance and Quality Review: Supervisors review summarised call transcripts and the VoiceProcessingAuditRecord to verify that calls were handled according to protocol.

Integration#

  • Command Centre: Voice processing results integrate with the PSAP and command centre call lifecycle, enriching call records with transcript and summary data.
  • Case Management: Key events and action items extracted from call summaries can be added to case records automatically.
  • Audit Trail: VoiceProcessingAuditRecord entries are linked to the platform-wide audit trail for EDF/PESCO compliance.
  • Incident Correlation: Transcript content feeds the incident correlator for cross-call pattern detection operating on plain text, not audio features.