AI Model Routing

Overview#

An organisation running ten different AI-powered features does not need ten different models. It needs the right model for each task at the moment the request arrives. A routine FAQ answer and a complex multi-jurisdictional risk analysis have very different requirements for quality, cost, and latency. The AI Model Routing system makes that distinction automatically, analysing each incoming request and directing it to the best-fit model without requiring manual configuration per feature.

This eliminates the overhead of model selection decisions while ensuring organisations achieve optimal cost-performance outcomes from their multi-model AI strategy across the platform's 153 third-party integrations.

Key Features#

Intelligent Model Selection: Analyses incoming requests across multiple dimensions including task complexity, required capabilities, latency tolerance, and cost budget to route each query to the best-suited model in real time.
Capability Taxonomy: Maintains a detailed map of model strengths across dozens of capability categories such as code generation, multilingual translation, mathematical reasoning, document analysis, and content moderation.
Cost-Performance Optimisation: Dynamically selects models to achieve the best quality-to-cost balance, automatically using economical models for routine tasks while reserving premium models for complex queries.
Real-Time Performance Tracking: Monitors per-model metrics continuously, adapting routing decisions based on current latency, error rates, and throughput to maintain quality as conditions change.
Multi-Provider Failover: Configures fallback chains of up to five models per capability category, with automatic failover across providers to maintain high availability.
Auto-Scaling and Load Balancing: Dynamically adjusts capacity across model endpoints based on demand, with predictive scaling that anticipates traffic changes and distributed rate limiting across multiple service credentials.
A/B Testing Framework: Routes traffic between model configurations to collect statistically significant performance data, enabling data-driven decisions about model upgrades and routing changes.
Request Preprocessing: Optimises requests before routing through prompt template optimisation, context compression, language detection, and format normalisation.
Response Post-Processing: Enhances model outputs with format validation, PII detection and redaction, content moderation, and response caching.
Multi-Model Consensus: Routes requests to multiple models and combines outputs through voting or weighted averaging for improved accuracy on critical tasks.
Custom Model Support: Integrates self-hosted and fine-tuned models alongside cloud provider models through standard endpoint interfaces.

Use Cases#

Enterprise AI Chatbots: Routes customer queries to models based on complexity, language, and cost constraints, using economical models for simple FAQs and premium models for complex technical support queries that require deeper reasoning.
Code Generation Platforms: Matches coding requests to models with the strongest performance in specific programming languages, with quality validation to ensure output reliability before delivery.
Compliance-Sensitive Applications: Enforces data residency through geographic routing, ensuring customer data is processed only by models hosted in compliant regions. Financial crime units and healthcare fraud investigators working under GDPR or sector-specific data localisation rules benefit directly from this control.
Content Generation Services: Balances creativity requirements with cost constraints by routing marketing content to creative models, technical documentation to precision models, and batch processing to economy models.

Integration#

The platform integrates with existing AI infrastructure through a flexible API layer with complete distributed tracing, metrics collection, and structured logging. It supports gradual migration from single-model architectures through canary deployments and shadow traffic analysis.

Open Standards#

OAuth 2.0 and JWT Bearer Token: Token-based authentication protects typed, auditable read and write workflows across the platform.
OpenAI Chat Completions API (REST/HTTP): Provider adapters send messages in the OpenAI chat format (system/user/assistant roles, tool call objects) over HTTPS with Bearer token authentication, making the routing layer interoperable with any endpoint that implements this de-facto standard.
OAuth 2.0 Bearer Token (RFC 6750): All outbound calls to model provider APIs authenticate using Bearer tokens carried in the HTTP Authorization header, as defined by RFC 6750.
W3C Trace Context (Level 1, 2021): Every routed request propagates traceparent and tracestate headers, enabling correlated distributed traces across the multi-provider call chain.
OpenTelemetry (OTLP, 2023): The platform exports spans and metrics via the OpenTelemetry Protocol (OTLP-HTTP) to any compatible backend, allowing per-model latency and error-rate data to feed routing decisions.
EU AI Act (Regulation (EU) 2024/1689, Article 12 / Annex III): Every LLM call is logged to an audit table with input hash, model identifier, token usage, and user context to satisfy the record-keeping obligations that apply to high-risk AI system outputs used in investigations.
ISO 8601: All timestamps in routing responses, usage records, and cost summaries are serialised as ISO 8601 date-time strings with UTC offset.

Last Reviewed: 2026-02-23 Last Updated: 2026-04-14