Back to overview

Provider Comparison Matrix

Benchmarks, pricing models, ELO scores, word error rates, feature sets, and compliance certifications across every active TTS & STT provider. Updated April 2026.

Filter providers by type
WERELOTTFAMOS— hover for definitions
Provider
Inworld AINEW
TTS
ElevenLabs
BOTH
Deepgram
BOTH
AssemblyAI
STT
Cartesia
TTS
Mistral VoxtralNEW
TTS
Microsoft MAINEW
BOTH
xAI Grok TTSNEW
TTS
OpenAI
BOTH
Azure AI Speech
BOTH
Google Cloud Speech
BOTH
Hume AINEW
TTS
LeanVoxNEW
TTS
Kokoro v1.0OPEN SOURCE
TTS
ChatterboxNEWOPEN SOURCE
TTS
Qwen3-TTSNEWOPEN SOURCE
TTS
Fish Audio S2 ProNEWOPEN SOURCE
TTS
Moonshine (Useful Sensors)NEWOPEN SOURCE
STT
PlayHTDISCONTINUED
TTS
Pricing
TTS-1.5 Max: $30/1M chars (enterprise)
TTS-1.5 Mini: $15/1M chars (low latency)
Starter: $5/mo for 30k chars
Creator: $22/mo for 100k chars
Scale API: ~$165/1M chars (Scale)
Nova-3 STT: $0.0043–$0.0077/min (per-second billing)
Voice Agent API: ~$0.075/min (STT+LLM+TTS)
Universal-2: $0.0025/min — 99 languages
Universal-3 Pro: $0.0035/min — prompt-based customization
Pay-as-you-go: $5/100k credits (1 credit/char)
API: $16/1M chars
MAI-Transcribe-1: ~$0.017/min (Azure pricing)
MAI-Voice-1: $16/1M chars
API: ~$15/1M chars (estimated)
TTS Standard: $15/1M chars (tts-1)
TTS HD: $30/1M chars (tts-1-hd)
GPT-4o Transcribe: $0.006/min — free diarization
Mini Transcribe: $0.003/min — budget option
Neural TTS: $15–16/1M chars
STT Standard: $0.017/min (140+ languages)
WaveNet: $4/1M chars (standard)
Chirp 3 HD: $30/1M chars (HD)
STT Standard: $0.024/min (60 min/mo free)
Octave 2: $7.60/1M chars
Standard: $5/1M chars
Self-hosted: Free (compute only)
Hosted (DeepInfra): ~$0.65/1M chars hosted
Self-hosted: Free (MIT license)
Self-hosted: Free (Apache 2.0)
API: ~$10/1M chars (API)
Self-hosted: Free (open weights)
Self-hosted: Free (MIT license)
Service discontinued
Quality & ELO
Quality
Speed
ELO 1,236
130ms TTFA
Quality
Speed
ELO 1,197
75ms TTFA
Quality
Speed
5.3% WER
Quality
Speed
14.5% WER
Quality
Speed
40ms TTFA
Quality
Speed
90ms TTFA
Quality
Speed
3.8% WER
Quality
Speed
Quality
Speed
5% WER
Quality
Speed
Quality
Speed
Quality
Speed
Quality
Speed
Quality
Speed
ELO 1,056
Quality
Speed
Quality
Speed
1.24% WER
Quality
Speed
ELO 1,128
Quality
Speed
Quality
Speed
Key Features
  • #1 TTS Arena ELO
  • Zero-shot Voice Cloning (5–15s)
  • Sub-250ms P90 Latency
  • Domain-specific Pronunciation
  • Healthcare/Finance/Legal
  • Eleven v3 (GA Feb 2)
  • On-Premise / On-Device (Apr 9)
  • Voice Cloning (10,000+ voices)
  • Scribe v2 STT
  • Dubbing & Translation
  • 74 Languages
  • ElevenAgents
  • IBM watsonx Integration
  • Nova-3 (5.3% WER)
  • Sub-300ms Streaming
  • Flux Turn Detection
  • Diarization
  • Smart Formatting
  • TTS Speed Controls (0.7–1.5×)
  • Self-hosted Deployment
  • 45+ Languages
  • Per-second Billing
  • Universal-2 (99 languages)
  • Universal-3 Pro Streaming
  • Prompt-based Domain Customization
  • Medical Mode (en/es/de/fr)
  • Sentiment Analysis
  • PII Redaction
  • LLM Integration
  • Audio Intelligence
  • Sonic 3 (SageMaker)
  • 40ms TTFA (Sonic Turbo)
  • 3-second Voice Cloning
  • Emotion Control
  • Sonic Flash 75ms
  • SageMaker JumpStart
  • 4B Parameters
  • 90ms TTFA
  • Smartphone Deployment (3GB RAM)
  • Voice Cloning
  • EU Data Sovereignty
  • CC BY-NC 4.0 (open weights)
  • MAI-Voice-1 (cloning)
  • MAI-Transcribe-1 (3.8% WER)
  • Beats Whisper-large-v3 (22/25 languages)
  • Beats ElevenLabs Scribe v2 (15/25)
  • 25 Language STT
  • Half GPU usage vs competitors
  • OpenAI Realtime API Compatible
  • Drop-in Migration Path
  • xAI Infrastructure
  • GPT-4o Transcribe (5% WER)
  • GPT-4o Mini Transcribe
  • Free Diarization
  • tts-1 / tts-1-hd
  • 99+ Languages (STT)
  • Simple API
  • 140+ Languages (TTS)
  • 500+ Neural Voices
  • Custom Neural Voice
  • Speech Translation
  • Real-time Captions
  • Avatar Video Synthesis
  • Chirp 3 HD
  • WaveNet & Studio Voices
  • Gemini Integration
  • 380+ Voices
  • 75+ Languages
  • 60 min/mo Free STT
  • Google Translate Integration
  • Octave 2 Voice Model
  • TADA Architecture (1B/3B)
  • Zero Content Hallucinations
  • 10× Context Efficiency
  • Emotional Intelligence
  • 11 Languages
  • 23+ Languages
  • Standard Neural Voices
  • REST API
  • 82M Parameters
  • MOS 4.2 (highest open-source)
  • CPU / Raspberry Pi Capable
  • 210× Real-time on GPU
  • Apache 2.0 License
  • 9 Languages
  • MIT License
  • 63.75% Preferred over ElevenLabs
  • Chatterbox Turbo (sub-200ms)
  • Chatterbox Multilingual (23 languages)
  • PerTh Neural Watermarking
  • Paralinguistic Tags [laugh] [cough]
  • 11K+ GitHub Stars
  • Emotion Control
  • Apache 2.0 License
  • 0.77% Chinese WER
  • 1.24% English WER
  • 0.6B & 1.7B Variants
  • 49+ Voice Presets
  • 12Hz Proprietary Tokenizer
  • Natural-language Voice Design
  • 10 Languages
  • ELO 1128 (Best Open-weights)
  • Voice Cloning
  • Open Weights
  • Commercial API Available
  • 245M Parameters (MIT)
  • Matches Whisper Large-v3
  • 1/6 the Size of Whisper
  • Mobile & Embedded Ready
  • CPU Capable
  • Service Discontinued
  • Acquired by Meta (Dec 31, 2025)
  • Migrate to: ElevenLabs, Chatterbox, Kokoro
Languages30+ 74+ 45+ 99+ 🌍20+ 9+ 25+ 13+ 99+ 🌍140+ 🌍75+ 11+ 23+ 9+ 23+ 10+ 15+ 1+ N/A
Compliance
SOC2HIPAA
SOC2HIPAAGDPR
HIPAASOC2
HIPAASOC2 Type 2ISO 27001:2022PCI DSS v4.0GDPR
Azure ComplianceGDPRHIPAA
SOC2GDPR
SOC2HIPAAISO 27001GDPRFedRAMP
SOC2HIPAAISO 27001GDPR
Best For
voice agentcontent creationenterprise
content creationnarrationvoice agententerprise
voice agenttranscriptionanalyticsreal time
analyticstranscriptionunderstandingenterprise
voice agentreal time
voice agentbudgetoffline
enterprisetranscriptionaccessibility
voice agentprototyping
simple appprototypingtranscription
enterpriseaccessibilityglobal
enterpriseanalyticsaccessibility
voice agentcontent creation
budgetsimple app
budgetaccessibilityoffline
budgetcontent creationvoice agentoffline
budgetcontent creationaccessibility
content creationbudgetvoice agent
accessibilityofflinebudget

Data sourced from Artificial Analysis Speech Arena, HuggingFace Open ASR Leaderboard, and official provider documentation. All prices approximate as of April 2026. Benchmark scores may vary by use case.