Overview#
AI inference costs are not always intuitive. A feature that looks modest in development can become the largest cost driver in production once hundreds of analysts use it daily. The Token Usage Management module gives administrators real-time visibility into exactly what is being consumed, by whom, at what cost, so model selection and quota decisions are made with current data rather than estimates.
Every AI request is tracked at the per-request level. Billable tokens apply a 1.5x multiplier to raw token counts. When provider APIs do not return exact counts, the system estimates from response length as a documented fallback.
Mermaid diagram
flowchart TD A[AI Request] --> B[Token Metering] B --> C[Raw Count: Input + Output] C --> D[Billable Units: Raw × 1.5] D --> E[Cost Calculation by Model Tier] E --> F{Quota Limit?} F -->|Warning Zone| G[Alert Sent] F -->|Hard Limit| H[Request Blocked] F -->|Within Limit| I[Request Proceeds] E --> J[Usage Dashboard] E --> K[Cost Attribution by Dept / Feature]
Key Features#
-
Real-Time Token Metrics: Monitor token consumption as it happens with live dashboards showing current-day usage, hourly rates, month-to-date totals, projected month-end consumption, and quota utilisation. Break down usage by AI model, feature, user, department, or investigation for granular visibility.
-
Raw and Billable Token Tracking: Every AI request reports both raw token counts (input tokens, output tokens, total tokens) and billable token units that apply a 1.5x multiplier to raw counts. Per-request telemetry includes the AI provider, model identifier, model tier, latency in milliseconds, and a complete usage breakdown. When provider APIs do not return exact token counts, the system estimates usage from response character length as a documented fallback.
-
Cost Analysis: Detailed cost breakdowns by model, feature, user, and time period. Understand cost per request, cost per investigation, and cost per user session to make informed decisions about AI resource allocation and model selection.
-
Usage Analytics: Analyse consumption patterns with daily, weekly, and monthly trends, seasonal pattern detection, and anomaly identification. Identify top consumers, compare department usage, and track feature-level efficiency.
-
Quota Management: Set and enforce token usage quotas at the organisation, department, and individual user level. Configure monthly and daily limits with progressive alert thresholds. Choose between soft limits (warnings only) and hard limits (usage blocked) based on your governance requirements.
-
Optimisation Recommendations: Actionable suggestions for reducing token costs including model selection optimisation, prompt efficiency improvements, caching opportunities, and batch processing strategies. Each recommendation includes estimated savings and implementation complexity.
-
Predictive Forecasting: Forecast future token usage and costs based on historical patterns, seasonal trends, and growth trajectories. Assess budget risk and plan capacity to avoid unexpected cost overruns.
-
Cost Allocation: Attribute AI costs to business units, departments, projects, and investigations for chargeback and internal accounting. Track return on investment at the feature level to prioritise AI capabilities that deliver measurable value.
Use Cases#
- Budget management with real-time cost visibility, quota enforcement, and projected spend forecasting that prevent unexpected AI cost overruns at period end.
- Cost optimisation through model selection guidance, prompt efficiency analysis, and caching recommendations that reduce consumption without sacrificing output quality.
- Usage governance with configurable quotas at organisation, department, and user levels to ensure fair resource allocation and prevent individual overconsumption.
- Business intelligence through cost attribution that connects AI spending to business outcomes, enabling informed decisions about which AI features to expand or reduce.
- Anomaly detection that identifies unusual consumption patterns, failed request surges, and unexpected cost spikes for rapid investigation.
Getting Started#
- Establish Baseline: Monitor usage for 30 days to understand normal consumption patterns before setting quotas.
- Configure Quotas: Set organisation and department-level token limits based on your baseline data and budget.
- Set Up Alerts: Configure progressive alert thresholds to receive early warning as usage approaches limits.
- Review Recommendations: Act on optimisation suggestions starting with high-impact, low-effort improvements.
- Schedule Reports: Set up regular usage and cost reports for stakeholders and budget owners.
Availability#
- Enterprise Plan: Included (all analytics, predictive forecasting, optimisation recommendations, cost allocation)
- Professional Plan: Core usage monitoring and basic quotas included; advanced analytics and optimisation available as add-on
Last Reviewed: 2026-04-02 Last Updated: 2026-04-14