[AI & ML]

Multi-Provider Realtime Voice AI

The Multi-Provider Realtime Voice AI module delivers live conversational AI capabilities through simultaneous support for OpenAI Realtime API and Google Gemini Live, enabling natural voice interactions for dispatch opera

Module metadata

The Multi-Provider Realtime Voice AI module delivers live conversational AI capabilities through simultaneous support for OpenAI Realtime API and Google Gemini Live, enabling natural voice interactions for dispatch opera

Back to All Modules

Source reference

content/modules/ai-realtime-voice.md

Last Updated

Apr 2, 2026

Category

AI & ML

Content checksum

e67abb8fc26ab920

Tags

aireal-time

Rendered documentation

This page renders the module Markdown and Mermaid directly from the public documentation source.

Overview#

The Multi-Provider Realtime Voice AI module delivers live conversational AI capabilities through simultaneous support for OpenAI Realtime API and Google Gemini Live, enabling natural voice interactions for dispatch operations, field reporting, and public-facing communication channels. The system manages provider selection, voice resolution, greeting injection timing, and session lifecycle across providers while maintaining a unified interface for consuming applications.

By abstracting multiple voice AI providers behind a single orchestration layer, organizations gain provider redundancy, best-of-breed voice quality, and the ability to route conversations to the optimal provider based on language, latency requirements, and cost constraints.

Key Features#

  • OpenAI Realtime API Integration -- Full support for OpenAI's general availability Realtime API with streaming audio input and output, function calling during voice sessions, and configurable voice personas
  • Google Gemini Live Integration -- Native integration with Google Gemini Live for multimodal voice interactions, with optimized greeting injection timing and voice resolution to ensure natural conversation flow
  • Provider-Agnostic Interface -- A unified API abstracts provider-specific protocols, enabling applications to initiate voice sessions without coupling to a specific provider's SDK or session management model
  • Automatic Provider Failover -- Real-time health monitoring of voice providers with automatic session migration when a provider experiences degraded quality or availability
  • Voice Persona Management -- Configure and manage voice personas with provider-specific voice selection, speaking rate, pitch adjustment, and personality prompting for consistent brand representation
  • Secure API Key Management -- Provider API keys are stored and rotated through the platform's secrets management system, never exposed to client applications, with per-tenant key isolation
  • Session Analytics -- Track voice session duration, provider utilization, latency metrics, turn-taking patterns, and user satisfaction signals for continuous optimization
  • Session Prewarming -- Pre-establish WebSocket connections and complete provider setup handshakes for both OpenAI Realtime and Google Gemini Live while the telephony system plays the initial greeting. This eliminates 500-1500ms of latency from the first AI response, creating a seamless experience for callers. Prewarming is subsystem-aware, using country-specific emergency number greetings (911, 112, 119) for PSAP lines and customer labels for non-PSAP lines.
  • CPR Cadence Audio Injection -- During medical emergencies requiring chest compressions, the system generates and injects drift-corrected 110 BPM metronome audio directly into the Twilio WebSocket as base64-encoded mulaw frames. The cadence engine pauses automatically when the caller speaks, pauses every 60 beats for a check-in, and resumes without manual intervention. Audio generation uses a precomputed PCM16-to-mulaw lookup table for consistent sub-millisecond encoding performance.
  • Medical Protocol Function Tools -- Voice AI sessions expose five medical-specific function tools to the AI model: create SOP suggestion (draft a new procedure), start medical coaching (activate a structured protocol), advance protocol step (progress through the procedure), start CPR cadence (begin chest compression timing), and stop CPR cadence. These tools enable the AI to guide callers through verified medical procedures without inventing steps.
  • Tenant Isolation -- Complete separation of voice sessions, API credentials, usage quotas, and conversation history between tenants with no cross-tenant data leakage

Use Cases#

  • AI-Assisted Dispatch -- Dispatchers interact with AI through natural voice to query case details, update incident status, and receive situation briefings without leaving their communication workflow
  • Field Reporting -- Officers and field agents dictate reports through voice AI that structures the narrative into standardized report formats with entity extraction and automatic case linking
  • Public Communication -- Automated voice interfaces handle routine public inquiries, triage incoming calls, and escalate complex requests to human operators with full conversation context
  • Multilingual Operations -- Voice AI provides real-time interpretation and translation during cross-language interactions, with provider selection optimized for the language pair involved

Integration#

This module connects to the AI/LLM orchestration layer for prompt management and safety guardrails, the authentication service for session authorization, and the PSAP dispatch system for operational voice workflows. Voice session transcripts feed into the case management and audit logging systems for record keeping.

Availability#

  • Enterprise Plan: Full multi-provider voice AI included
  • Professional Plan: Single provider voice AI included; multi-provider and advanced persona management available as add-on

Last Reviewed: 2026-04-02