[Developers]

Audio Intelligence: Real-Time Acoustic Detection During Emergency Calls

The Audio Intelligence module continuously analyses the audio stream of live emergency calls to detect life-threatening sounds, physiological distress indicators, and caller state signals that are not captured in the spo

Category: IntelligenceLast Updated: Apr 14, 2026
intelligenceai

Overview#

The Audio Intelligence module continuously analyses the audio stream of live emergency calls to detect life-threatening sounds, physiological distress indicators, and caller state signals that are not captured in the spoken transcript. Operating in real time alongside the AI dispatcher conversation, it identifies agonal breathing patterns, acoustic events such as gunshots, screams, and alarms, and voice biometric markers including stress, age range, and potential intoxication, automatically escalating to dispatchers and flagging alerts on the operations dashboard without interrupting the call.

The module addresses a critical gap in text-based emergency AI: transcripts only capture what is said, not the acoustic environment around the caller. Audio Intelligence ensures that what is heard, but not spoken, is acted upon.

Key Features#

  • Agonal Breathing Detection: Identifies the distinctive interrupted gasping pattern associated with cardiac arrest and severe trauma, triggering an immediate P1 priority escalation and alerting the dispatcher to begin CPR coaching even before the caller can describe the situation
  • Acoustic Event Detection: Classifies environmental audio events including gunshots, screams, breaking glass, alarms, and vehicle collision sounds, providing dispatchers with scene context beyond the caller's verbal account
  • Voice Stress Analysis: Detects elevated physiological stress markers in the caller's voice, supporting triage decisions where a caller is describing a situation calmly but audible stress indicators suggest greater urgency
  • Age Estimation: Provides a broad age-range estimate from voice characteristics, helping dispatchers identify when a caller is likely a child or elderly person and apply appropriate escalation and communication protocols
  • Intoxication Indicators: Flags speech patterns consistent with intoxication to inform dispatcher judgement and support appropriate resource allocation, without overriding the dispatcher's assessment
  • P1 Auto-Escalation: Critical detections (agonal breathing, gunshots, screams) automatically raise incident priority to P1 and create an AUDIO_INTEL_ALERT event on the dispatcher dashboard, ensuring no critical signal goes unnoticed in a high-call-volume environment
  • Parallel Audio Analysis: Runs independently of the AI conversation pipeline so acoustic analysis never introduces latency into the dispatcher's spoken exchange with the caller
  • Confidence-Scored Alerts: Each detection includes a confidence score; alerts below threshold are surfaced as advisory signals rather than auto-escalations, reducing false positive noise for dispatchers

Use Cases#

Unresponsive Caller with Agonal Breathing#

A caller dials 999 or 911 but cannot speak. The AI dispatcher hears laboured, interrupted breathing consistent with cardiac arrest. Audio Intelligence triggers an immediate P1 escalation, dispatches emergency medical units, and prompts the AI to begin guided CPR instructions for any bystanders present, before a single word has been spoken by the caller.

Active Threat Identification#

During a call reporting a disturbance, acoustic event detection identifies what sounds consistent with a gunshot. The dispatcher receives an AUDIO_INTEL_ALERT, enabling reassessment of the incident type from public disturbance to active threat and dispatch of appropriate resources.

Child Caller Recognition#

Age estimation detects a caller is likely a child. The system flags this for the dispatcher and the AI shifts to a calmer, simpler communication style suited to a young or distressed caller, while applying child-vulnerability escalation protocols.

Panic vs Calm Discrepancy#

A caller describes what sounds like a minor situation in composed language, but voice stress analysis detects significant physiological stress markers. The dispatcher is alerted to the discrepancy, prompting more careful probing before downgrading incident priority.

Scene Awareness During Vehicle Incidents#

During a road traffic collision call, acoustic detection identifies alarm sounds and airbag deployment patterns in the background, providing scene context that supports the dispatcher's resource allocation before units arrive to confirm the situation.

How It Works#

Integration#

Audio Intelligence operates as a parallel processing track within the Voice AI session, receiving the same audio stream as the speech-to-text transcription pipeline without intercepting or delaying it. Detections are published as structured events to the dispatcher operations dashboard and can trigger automatic incident priority changes through the Incident Management module. The AUDIO_INTEL_ALERT event type appears in the live call panel alongside the AI conversation transcript, giving dispatchers a unified view of spoken content and acoustic signals. Alert thresholds and enabled detection categories are configurable per tenant.

Open Standards#

  • NENA i3 (NENA-STA-010.3 / NENA-STA-021): Audio intelligence detections are published as AUDIO_INTEL_ALERT events within the NENA i3 audit and incident framework, and critical detections trigger priority changes on the NENA EIDO-structured incident record.
  • IETF RFC 7865 / RFC 7866 (SIPREC): The audio stream is received from the Session Border Controller via a SIPREC session in which the platform acts as the Session Recording Server (SRS); RFC 7865 carries the recording metadata XML and RFC 7866 governs the SRS role.
  • ITU-T G.711 / Opus (RFC 6716): The SIPREC media sink accepts audio/pcmu and audio/pcma (G.711 µ-law and A-law) as well as audio/opus containers; the audio analysis pipeline consumes 16-bit linear PCM decoded from these codecs.
  • IETF RFC 7852 (Additional Data Related to an Emergency Call): Acoustic detection results and confidence scores are appended as structured additional-data blocks on the in-progress emergency call, supplementing the caller's spoken account with acoustic context.
  • ISO 639-1 (Language Codes): The parallel speech-to-text transcription pipeline, which runs alongside audio intelligence on the same stream, tags transcript segments with ISO 639-1 language identifiers to support multilingual call handling.
  • GraphQL over WebSocket (graphql-ws): Dispatcher dashboard alerts, including AUDIO_INTEL_ALERT and P1 escalation notifications, are delivered in real time through tenant-scoped GraphQL subscriptions transported over the WebSocket protocol.
  • IETF RFC 7852 / NENA-STA-012 (Emergency Call Additional Data): Biometric indicators such as voice stress scores and age-range estimates are structured as NENA-STA-012-compatible additional-data blocks, ensuring they remain attached to the call audit trail for post-incident review.

Compliance#

  • Audio analysis is performed only on active emergency calls within the answering organisation's scope
  • Biometric voice analysis data (stress indicators, age estimation) is used for triage support only and is not stored as a permanent profile attribute
  • Detection events and their confidence scores are recorded in the call audit trail for post-incident review and quality assurance
  • Alert thresholds can be tuned by supervisors to balance sensitivity against false positive rates appropriate for each deployment context

Last Reviewed: 2026-04-14 Last Updated: 2026-04-14

Ready to Build?

Get started with our APIs or contact our integration team for support.