Mistral Voxtral (Mar 26)
4B-parameter open-weight model, 90ms TTFA, runs on smartphones with 3 GB RAM. $16/1M chars.
Real benchmarks. Transparent pricing. Guided recommendations.
20+ TTS & STT providers, updated for April 2026.
Independent Benchmarks
Quality, speed, features, and price efficiency across all active providers — sorted by composite value score.
Stacked analysis of Quality, Speed, Features, and Price efficiency. Higher total area = better overall value. Sorted by composite score.
| Provider | Quality (out of 5) | Speed (out of 5) | Features (out of 5) | Price Score (out of 5) | Total (out of 20) |
|---|---|---|---|---|---|
| Deepgram | 4.5 | 5 | 4 | 4 | 17.5 |
| AssemblyAI | 4.5 | 3 | 5 | 5 | 17.5 |
| Chatterbox | 4.5 | 4 | 4 | 5 | 17.5 |
| Qwen3-TTS | 4.5 | 4 | 4 | 5 | 17.5 |
| Inworld AI | 5 | 5 | 5 | 2 | 17 |
| Azure AI Speech | 4 | 4 | 5 | 4 | 17 |
| Cartesia | 4.5 | 5 | 4 | 3 | 16.5 |
| Hume AI | 4.5 | 4 | 4 | 4 | 16.5 |
| Fish Audio S2 Pro | 4.5 | 4 | 3 | 5 | 16.5 |
| Kokoro | 4.2 | 4 | 3 | 5 | 16.2 |
| ElevenLabs | 5 | 4 | 5 | 2 | 16 |
| Microsoft MAI | 5 | 4 | 4 | 3 | 16 |
Scores are composite ratings (1–5 per dimension) compiled from Artificial Analysis, HuggingFace TTS Arena, and official benchmarks. Price Score: 5 = cheapest. Open-source models at $0 self-hosted.
Recommendation Engine
Answer 3 questions to get a personalised recommendation rooted in April 2026 benchmarks.
Step 1 of 3
Market Updates — April 2026
Three new entrants, open-source models beating commercial leaders, and one major shutdown.
4B-parameter open-weight model, 90ms TTFA, runs on smartphones with 3 GB RAM. $16/1M chars.
MAI-Transcribe-1 achieves 3.8% WER — beats Whisper-large-v3 on 22 of 25 languages.
OpenAI Realtime API-compatible format enables drop-in migration from existing stacks.
$500M Series D led by Sequoia. On-premise/on-device now available. $330M+ ARR.
MIT-licensed model preferred by 63.75% of evaluators in blind tests. Free, 23 languages.
Shut down Dec 31, 2025 after Meta acquisition. Migrate to ElevenLabs or Chatterbox.
Provider Comparison
Benchmarks, pricing, ELO scores, WER, features, and compliance across every active TTS & STT provider.
| Provider | Inworld AINEW TTS | ElevenLabs BOTH | Deepgram BOTH | AssemblyAI STT | Cartesia TTS | Mistral VoxtralNEW TTS | Microsoft MAINEW BOTH | xAI Grok TTSNEW TTS | OpenAI BOTH | Azure AI Speech BOTH | Google Cloud Speech BOTH | Hume AINEW TTS | LeanVoxNEW TTS | Kokoro v1.0OPEN SOURCE TTS | ChatterboxNEWOPEN SOURCE TTS | Qwen3-TTSNEWOPEN SOURCE TTS | Fish Audio S2 ProNEWOPEN SOURCE TTS | Moonshine (Useful Sensors)NEWOPEN SOURCE STT | PlayHTDISCONTINUED TTS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Pricing | TTS-1.5 Max: $30/1M chars (enterprise) TTS-1.5 Mini: $15/1M chars (low latency) | Starter: $5/mo for 30k chars Creator: $22/mo for 100k chars Scale API: ~$165/1M chars (Scale) | Nova-3 STT: $0.0043–$0.0077/min (per-second billing) Voice Agent API: ~$0.075/min (STT+LLM+TTS) | Universal-2: $0.0025/min — 99 languages Universal-3 Pro: $0.0035/min — prompt-based customization | Pay-as-you-go: $5/100k credits (1 credit/char) | API: $16/1M chars | MAI-Transcribe-1: ~$0.017/min (Azure pricing) MAI-Voice-1: $16/1M chars | API: ~$15/1M chars (estimated) | TTS Standard: $15/1M chars (tts-1) TTS HD: $30/1M chars (tts-1-hd) GPT-4o Transcribe: $0.006/min — free diarization Mini Transcribe: $0.003/min — budget option | Neural TTS: $15–16/1M chars STT Standard: $0.017/min (140+ languages) | WaveNet: $4/1M chars (standard) Chirp 3 HD: $30/1M chars (HD) STT Standard: $0.024/min (60 min/mo free) | Octave 2: $7.60/1M chars | Standard: $5/1M chars | Self-hosted: Free (compute only) Hosted (DeepInfra): ~$0.65/1M chars hosted | Self-hosted: Free (MIT license) | Self-hosted: Free (Apache 2.0) | API: ~$10/1M chars (API) Self-hosted: Free (open weights) | Self-hosted: Free (MIT license) | Service discontinued |
| Quality & ELO | Quality Speed ELO 1,236 130ms TTFA | Quality Speed ELO 1,197 75ms TTFA | Quality Speed 5.3% WER | Quality Speed 14.5% WER | Quality Speed 40ms TTFA | Quality Speed 90ms TTFA | Quality Speed 3.8% WER | Quality Speed | Quality Speed 5% WER | Quality Speed | Quality Speed | Quality Speed | Quality Speed | Quality Speed ELO 1,056 | Quality Speed | Quality Speed 1.24% WER | Quality Speed ELO 1,128 | Quality Speed | Quality Speed |
| Key Features |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Languages | 30+ | 74+ | 45+ | 99+ 🌍 | 20+ | 9+ | 25+ | 13+ | 99+ 🌍 | 140+ 🌍 | 75+ | 11+ | 23+ | 9+ | 23+ | 10+ | 15+ | 1+ | N/A |
| Compliance | SOC2HIPAA | SOC2HIPAAGDPR | HIPAASOC2 | HIPAASOC2 Type 2ISO 27001:2022PCI DSS v4.0GDPR | — | — | Azure ComplianceGDPRHIPAA | — | SOC2GDPR | SOC2HIPAAISO 27001GDPRFedRAMP | SOC2HIPAAISO 27001GDPR | — | — | — | — | — | — | — | — |
| Best For | voice agentcontent creationenterprise | content creationnarrationvoice agententerprise | voice agenttranscriptionanalyticsreal time | analyticstranscriptionunderstandingenterprise | voice agentreal time | voice agentbudgetoffline | enterprisetranscriptionaccessibility | voice agentprototyping | simple appprototypingtranscription | enterpriseaccessibilityglobal | enterpriseanalyticsaccessibility | voice agentcontent creation | budgetsimple app | budgetaccessibilityoffline | budgetcontent creationvoice agentoffline | budgetcontent creationaccessibility | content creationbudgetvoice agent | accessibilityofflinebudget | — |
Data sourced from Artificial Analysis Speech Arena, HuggingFace Open ASR Leaderboard, and official provider documentation. All prices approximate as of April 2026. Benchmark scores may vary by use case.
EU AI Act Article 50 — August 2, 2026. All voice AI systems must disclose AI origin, mark synthetic audio in machine-readable format, and comply with emotion recognition restrictions. California CAITA aligns to the same date. Penalties up to €30M or 7% of global turnover.
Pricing Calculator
Drag the volume slider to compare real costs across all active providers at your scale. Open-source options shown at $0 self-hosted.
Estimated at base pricing tiers.
TTS: ~15,000 chars/hr · STT: 60 mins/hr
Open-source models shown at $0 (self-hosted).
At 10h/month
Community Picks
Ranked from real production deployments, blind tests, and Artificial Analysis benchmarks.