Back to overview
Provider Comparison Matrix
Benchmarks, pricing models, ELO scores, word error rates, feature sets, and compliance certifications across every active TTS & STT provider. Updated April 2026.
WERELOTTFAMOS— hover for definitions
| Provider | Inworld AINEW TTS | ElevenLabs BOTH | Deepgram BOTH | AssemblyAI STT | Cartesia TTS | Mistral VoxtralNEW TTS | Microsoft MAINEW BOTH | xAI Grok TTSNEW TTS | OpenAI BOTH | Azure AI Speech BOTH | Google Cloud Speech BOTH | Hume AINEW TTS | LeanVoxNEW TTS | Kokoro v1.0OPEN SOURCE TTS | ChatterboxNEWOPEN SOURCE TTS | Qwen3-TTSNEWOPEN SOURCE TTS | Fish Audio S2 ProNEWOPEN SOURCE TTS | Moonshine (Useful Sensors)NEWOPEN SOURCE STT | PlayHTDISCONTINUED TTS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Pricing | TTS-1.5 Max: $30/1M chars (enterprise) TTS-1.5 Mini: $15/1M chars (low latency) | Starter: $5/mo for 30k chars Creator: $22/mo for 100k chars Scale API: ~$165/1M chars (Scale) | Nova-3 STT: $0.0043–$0.0077/min (per-second billing) Voice Agent API: ~$0.075/min (STT+LLM+TTS) | Universal-2: $0.0025/min — 99 languages Universal-3 Pro: $0.0035/min — prompt-based customization | Pay-as-you-go: $5/100k credits (1 credit/char) | API: $16/1M chars | MAI-Transcribe-1: ~$0.017/min (Azure pricing) MAI-Voice-1: $16/1M chars | API: ~$15/1M chars (estimated) | TTS Standard: $15/1M chars (tts-1) TTS HD: $30/1M chars (tts-1-hd) GPT-4o Transcribe: $0.006/min — free diarization Mini Transcribe: $0.003/min — budget option | Neural TTS: $15–16/1M chars STT Standard: $0.017/min (140+ languages) | WaveNet: $4/1M chars (standard) Chirp 3 HD: $30/1M chars (HD) STT Standard: $0.024/min (60 min/mo free) | Octave 2: $7.60/1M chars | Standard: $5/1M chars | Self-hosted: Free (compute only) Hosted (DeepInfra): ~$0.65/1M chars hosted | Self-hosted: Free (MIT license) | Self-hosted: Free (Apache 2.0) | API: ~$10/1M chars (API) Self-hosted: Free (open weights) | Self-hosted: Free (MIT license) | Service discontinued |
| Quality & ELO | Quality Speed ELO 1,236 130ms TTFA | Quality Speed ELO 1,197 75ms TTFA | Quality Speed 5.3% WER | Quality Speed 14.5% WER | Quality Speed 40ms TTFA | Quality Speed 90ms TTFA | Quality Speed 3.8% WER | Quality Speed | Quality Speed 5% WER | Quality Speed | Quality Speed | Quality Speed | Quality Speed | Quality Speed ELO 1,056 | Quality Speed | Quality Speed 1.24% WER | Quality Speed ELO 1,128 | Quality Speed | Quality Speed |
| Key Features |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Languages | 30+ | 74+ | 45+ | 99+ 🌍 | 20+ | 9+ | 25+ | 13+ | 99+ 🌍 | 140+ 🌍 | 75+ | 11+ | 23+ | 9+ | 23+ | 10+ | 15+ | 1+ | N/A |
| Compliance | SOC2HIPAA | SOC2HIPAAGDPR | HIPAASOC2 | HIPAASOC2 Type 2ISO 27001:2022PCI DSS v4.0GDPR | — | — | Azure ComplianceGDPRHIPAA | — | SOC2GDPR | SOC2HIPAAISO 27001GDPRFedRAMP | SOC2HIPAAISO 27001GDPR | — | — | — | — | — | — | — | — |
| Best For | voice agentcontent creationenterprise | content creationnarrationvoice agententerprise | voice agenttranscriptionanalyticsreal time | analyticstranscriptionunderstandingenterprise | voice agentreal time | voice agentbudgetoffline | enterprisetranscriptionaccessibility | voice agentprototyping | simple appprototypingtranscription | enterpriseaccessibilityglobal | enterpriseanalyticsaccessibility | voice agentcontent creation | budgetsimple app | budgetaccessibilityoffline | budgetcontent creationvoice agentoffline | budgetcontent creationaccessibility | content creationbudgetvoice agent | accessibilityofflinebudget | — |
Data sourced from Artificial Analysis Speech Arena, HuggingFace Open ASR Leaderboard, and official provider documentation. All prices approximate as of April 2026. Benchmark scores may vary by use case.