Multi-Provider Realtime Voice AI

Overview#

Voice is the fastest human interface in high-pressure environments. A dispatcher handling a cardiac arrest call does not reach for a keyboard. A field officer reporting from a scene does not pause to type. A member of the public trying to report an emergency from a phone they cannot speak into needs an interface that works without a voice. The Multi-Provider Realtime Voice AI module covers all three by running a live, provider-agnostic conversational pipeline with a pluggable tool registry that exposes existing platform services (incident lookup, unit ETA, outage check, weather, hazards, utility accounts) as tools the AI can call during a live call.

Providers are abstracted behind a single orchestration layer so organisations gain provider redundancy, best-of-breed voice quality, and the ability to route conversations to the optimal provider based on language, latency requirements, and cost constraints. A parallel audio intelligence stream, a silent and duress caller state machine, real-time translation, location resolution, and a dispatcher AI Partner overlay layer on top of the core voice pipeline without changing the calling application.

Last Reviewed: 2026-03-02 Last Updated: 2026-03-02

Key Features#

Core voice pipeline#

Realtime Voice Integration: Full support for a realtime voice API with streaming audio input and output, function calling during voice sessions, and configurable voice personas.
Voice Assistant Provider Integration: Native integration with a multimodal voice AI provider, with optimised greeting injection timing and voice resolution to ensure natural conversation flow.
Provider-Agnostic Interface: A unified orchestration layer abstracts provider-specific protocols, enabling applications to initiate voice sessions without coupling to a specific provider's developer toolkit or session management model.
Automatic Provider Failover: Real-time health monitoring of voice providers with automatic session migration when a provider experiences degraded quality or availability.
Voice Persona Management: Configure and manage voice personas with provider-specific voice selection, speaking rate, pitch adjustment, and personality prompting for consistent brand representation.
Secure API Key Management: Provider service credentials are stored and rotated through the platform's secrets management system, never exposed to client applications, with per-tenant key isolation.
Session Prewarming: Pre-establishes WebSocket connections and completes provider setup handshakes for both voice AI providers while the telephony system plays the initial greeting. This eliminates 500 to 1500 ms of latency from the first AI response, creating a seamless experience for callers. Prewarming uses country-specific emergency number greetings (911, 112, 119) for PSAP lines and customer labels for non-PSAP lines.
Tenant Isolation: Complete separation of voice sessions, service credentials, usage quotas, and conversation history between tenants with no cross-tenant data leakage.
Session Analytics: Tracks voice session duration, provider utilisation, latency metrics, turn-taking patterns, and user satisfaction signals for continuous optimisation.

Pluggable tool registry#

Shared Tool Registry: A central tool registry wires existing platform services directly into the AI function-calling surface. New capabilities can be added to a live voice session by registering a tool without modifying the provider adapters.
Argus Service Tools: duplicate incident check, outage lookup, utility account lookup, closest unit lookup, unit ETA lookup, weather context lookup, hazard context lookup, and phrase translation are registered as tools that the AI can call mid-conversation with live platform data.
Caller Enrichment at Call Start: When a call arrives, a parallel enrichment pass fetches prior-call history, caller profile, premise hazards, open incidents at the address, and utility account status under a 450 ms budget. Results are injected into the system prompt as a KNOWN CALLER CONTEXT block before the AI speaks.

Audio intelligence#

Parallel Audio Intelligence Stream: A second model runs on the caller audio stream alongside speech recognition, producing signals the AI can act on without waiting for the caller to speak.
Agonal Breathing Detector: Listens for the abnormal respiratory pattern that often accompanies cardiac arrest and raises a priority signal so the AI begins a CPR instruction pathway sooner.
Acoustic Event Detector: Uses the existing audio scene classifier to flag gunshots, screams, alarms, and other critical sounds. Detections fire a P1 escalation and publish an audio intelligence alert event to the operator dashboard.
Voice Biometrics: Produces stress, age, and intoxication estimates from voice characteristics to support dispatcher triage. Signals are advisory and logged with the session for review.

Silent and duress caller handling#

Silent Caller State Machine: Detects calls where the caller cannot or will not speak and switches to yes-no prompting that the caller answers via keypad.
Pretend-Conversation Modes: Supports pizza-order and wrong-number conversational covers when the caller is under duress. The AI maintains the cover while a parallel duress-detected flag is raised to the operator.
DTMF Forwarding and Code Words: DTMF keypad tones are forwarded from the telephony transport to the silent-mode handler, and configurable code words trigger duress escalation.

Non-voice channels#

Text-to-911 Inbound SMS: An inbound SMS webhook manages an SMS session record, allowing a caller with a speech or hearing impairment or who cannot safely make a voice call to reach dispatch via text.
MMS via Emergency Image Analyser: Inbound MMS images are analysed by a vision model for scene context, which is fed to the AI as additional situational awareness.
Live Video Scaffolding: One-time JWT URLs and a frame-poster worker support a future live-video channel from the caller's device.

Real-time translation#

Multilingual Voice Sessions: Provider transcripts are passed through a translation layer that detects the caller language and translates to the operator language on the fly.
Bilingual Transcript Segments: The session transcript stores each segment in both languages so audit and case review do not require a separate translation step. An LRU cache deduplicates repeated phrases within the same call.

Location intelligence#

Multi-Format Location Resolver: The location resolution tool accepts street address, what3words phrases, Open Location Code (Plus Codes), named POIs and landmarks, fuzzy business names, and the caller's registered address as fallback. It returns a single resolved location object the AI can act on.

Dispatcher AI Partner#

AI Partner Overlay: A second model runs alongside the dispatcher, producing AI Partner hints, CAD form auto-fill suggestions, and red-flag alerts for the operator. The AI Partner does not speak to the caller; it is an assistant for the human operator.
Training and Replay Mode: The same AI Partner supports dispatcher QA practice by replaying archived or synthetic calls against the AI Partner pipeline.

Protocol card runner#

Generic Protocol Engine: A card runner executes structured protocol cards for cardiac arrest, structure fire, and overdose pathway pathways. The engine is built on a generic card specification and does not include IAED-copyrighted content, avoiding proprietary protocol licensing.
Card Runner Tools: start card, next question, record answer, and finalise card are exposed as tools so the AI or the dispatcher AI Partner can step through the card inside the same voice session.
CPR Cadence Audio Injection: During medical emergencies requiring chest compressions, the system generates and injects drift-corrected 110 BPM metronome audio directly into the telephony WebSocket as base64-encoded mulaw frames. The cadence engine pauses automatically when the caller speaks, pauses every 60 beats for a check-in, and resumes without manual intervention.

Hospital and medical integration#

Hospital Pre-Alert: A hospital pre-alert function delivers structured pre-alerts to the receiving hospital via HL7 FHIR R4 where supported, a configurable published integration channel where published, and SMS as the last-resort fallback. Hospital capacity and bed-status data inform trauma routing decisions.
Trauma Routing Service: Computes the correct receiving facility based on chief complaint, patient condition, bed status, and travel time.

Utility customer service#

Utility Service Tools: Account authentication, usage analytics, service scheduling, mock payment processing, and safety-critical troubleshooting (gas leak, downed line) are registered as tools for the utilities subsystem.
Safety-Critical Escalation: Gas leak and downed-line indicators bypass conversational triage and escalate directly to the appropriate operator queue.

Proactive outbound#

Outbound Voice Campaigns: The outbound voice service runs wellness checks, severe weather warnings, vulnerable-person calls, incident follow-ups, AMBER alerts, and auto-callback on abrupt disconnect during P1 calls.
Auto-Callback on Disconnect: A P1 call that ends abruptly triggers an automatic callback from the same number with context from the interrupted session.

Use Cases#

PSAP Calltaker Augmentation: A 911, 112, or 119 calltaker receives an incoming call, and the AI answers with a country-appropriate greeting within 150 ms while the calltaker is still putting the headset on. The AI's tool calls populate the CAD form (incident type, location, caller profile, nearest available unit, ETA, open hazards) before the calltaker has to type anything.
Cardiac Arrest Hands-Free Guidance: A caller reports that a relative has collapsed. The agonal breathing detector raises a high-priority signal, the AI starts the cardiac arrest protocol card, injects CPR cadence audio, and issues hands-free compressions guidance while the calltaker dispatches the ambulance. The protocol card decisions are recorded for later clinical review.
Silent Caller from a Domestic Violence Incident: A caller dials 112 but cannot speak. The silent caller state machine moves to yes-no prompting, the caller taps the keypad to confirm they are in danger, and the AI maintains a pretend pizza-order cover while the calltaker escalates.
Multilingual Emergency Call: A caller speaks a language the calltaker does not. The translation layer transcribes both sides in both languages, so the operator reads the caller's words in their own language and responds naturally. The bilingual transcript is attached to the case record.
Utility Customer Enquiry: A customer phones their utility provider about a suspected outage. The AI looks up the outage record at the caller's address, confirms restoration ETA from the existing outage management system, and handles the enquiry end-to-end. A suspected gas leak short-circuits the conversation and escalates to a dispatcher.
Dispatcher QA Practice: A new dispatcher runs through archived calls in training mode. The AI Partner provides the same hints it would during a live call, and the dispatcher's decisions are reviewed alongside the AI Partner's suggestions.
Proactive Wellness Call: A vulnerable-person register triggers an outbound wellness call before a forecast heatwave. The AI confirms the person is safe, logs the confirmation, and escalates where the call is unanswered or the person sounds unwell.

Integration#

This module connects to the AI orchestration layer for prompt management and safety guardrails, the authentication service for session authorisation, the PSAP dispatch system for operational voice workflows, and the existing incident, outage, utility, location, hazard, weather, and translation services through the shared tool registry. Voice session transcripts, audio intelligence signals, AI Partner hints, and protocol card decisions feed into the case management and immutable audit logging systems. Hospital pre-alerts use the ePCR Clinical Workspace pre-alert channel.

Open Standards#

HL7 FHIR R4: hospital pre-alerts and patient arrival notifications follow the FHIR R4 resource patterns.
Open Location Code (Plus Codes): supported as an input format to the location workflow handler, enabling grid-based addressing without a street address.
ITU-T G.711: the telephony audio transport uses G.711 mu-law (PCM) encoding for the real-time audio stream between the telephony bridge and the voice pipeline.
RFC 4733 (DTMF-Relay): DTMF tones are forwarded out of band for the silent-mode handler where the telephony transport supports it.
W3C WebRTC: the live-video scaffolding uses WebRTC for future caller-to-operator video.
NENA i3: the PSAP architecture aligns with NENA i3 session, location, and additional-data conventions.
IETF RFC 6443 (Emergency Calling Framework): the emergency voice session model follows the framework for emergency calling using internet multimedia.
NATO STANAG 4774 / 4778: session transcripts containing protected information carry the platform's confidentiality marking and metadata binding.

Multi-Provider Realtime Voice AI

Ready to Build?