Voice Provider Costs
Telephony, speech-to-text, and text-to-speech pricing comparison for the Empfio voice channel.
Overview
Running an AI voice agent involves three cost layers: telephony (the phone number and call minutes), speech-to-text (transcribing the caller), and text-to-speech (speaking the AI's reply). Empfio selects providers at each layer for the best cost-to-quality ratio for European SMEs.
Telephony — Phone Numbers
Empfio uses Telnyx as its primary telephony provider, replacing Twilio.
Number monthly costs
| Provider | German local | German mobile | US local | Notes |
|---|---|---|---|---|
| Telnyx (default) | €0.86/mo | €0.65/mo | $1.15/mo | Primary provider |
| Twilio | ~€15/mo | ~€15/mo | $1.15/mo | Legacy / fallback |
| Vonage | ~€3/mo | ~€5/mo | ~$1/mo | |
| Sinch | ~€3/mo | — | ~$1/mo |
Telnyx German local numbers are 17× cheaper than Twilio for the same capability (voice + SMS).
Inbound call per-minute rates (Telnyx)
| Type | Rate |
|---|---|
| Inbound voice | $0.002/min |
| Outbound voice (US) | $0.01/min |
| Inbound SMS | $0.004/message |
| Outbound SMS | $0.006/message |
Speech-to-Text (STT)
Converts the caller's speech to text in real time.
| Provider | Price/min | Streaming | Notes |
|---|---|---|---|
| Deepgram Nova-3 (default) | $0.0077/min ($0.46/hr) | Yes | Accepts mulaw/8000 natively, no audio conversion needed |
| AssemblyAI Universal-Streaming | $0.0025/min ($0.15/hr) | Yes | Cheapest real-time option, 3× cheaper than Deepgram |
| OpenAI Whisper API | $0.006/min | No | Cheapest overall but batch-only — not suitable for real-time calls |
| Google Cloud STT (standard) | $0.024/min | Yes | 15-second billing rounding |
| Azure Speech-to-Text | $0.017/min | Yes | |
| Telnyx STT (in-house) | $0.025/min | Yes | 3× more expensive than Deepgram |
| Telnyx STT (Google engine) | $0.050/min | Yes | Most expensive option |
Current choice: Deepgram Nova-3 — best balance of accuracy, real-time streaming latency (~100ms), and cost. Accepts mulaw/8000 audio natively so no conversion step is needed.
Best cost alternative: AssemblyAI Universal-Streaming at $0.15/hr is 3× cheaper than Deepgram with comparable real-time performance. Can be enabled via STT_PROVIDER=assemblyai in the voice service (requires adding the provider implementation).
Text-to-Speech (TTS)
Converts the AI agent's text reply to audio and streams it to the caller.
| Provider | Price/1K chars | Latency | Voice quality |
|---|---|---|---|
| ElevenLabs Turbo v2.5 (default) | ~$0.10/1K chars | ~200ms | Best |
| Cartesia Sonic 3 | $0.030/1K chars | Very low | Excellent — purpose-built for voice agents |
| Google Cloud Neural2 | $0.016/1K chars | ~300ms | Good |
| Azure Neural HD | $0.015/1K chars | ~300ms | Good |
| Telnyx TTS (Azure HD bundled) | $0.045/1K chars | — | Same as Azure at higher cost |
| OpenAI TTS-1 | $0.015/1K chars | ~300ms | Average |
Current choice: ElevenLabs Turbo v2.5 — most natural voice quality, purpose-built for low-latency streaming. Cost is higher but quality difference is clearly audible to callers, which matters for SME trust.
Best cost alternative: Cartesia Sonic 3 at $0.030/1K chars — 3× cheaper than ElevenLabs, built specifically for real-time voice agents with very low latency. Can be enabled via TTS_PROVIDER=cartesia.
Cost per call — example
Assumptions: 5-minute call, 200 words spoken by caller (~1,200 chars), 150 words spoken by AI (~900 chars).
| Component | Current stack | Optimized stack |
|---|---|---|
| Telnyx inbound (5 min) | $0.01 | $0.01 |
| STT — Deepgram (5 min) | $0.039 | — |
| STT — AssemblyAI (5 min) | — | $0.013 |
| TTS — ElevenLabs (900 chars) | $0.090 | — |
| TTS — Cartesia (900 chars) | — | $0.027 |
| Total per call | ~$0.14 | ~$0.05 |
The optimized stack (AssemblyAI + Cartesia) is ~3× cheaper per call while maintaining good quality. ElevenLabs voices are noticeably more natural, which may be worth the premium for customer-facing businesses.
Switching providers
All STT and TTS providers are swappable via environment variables in the voice service. No code changes required for supported providers.
# .env (voice service)
STT_PROVIDER=deepgram # or: assemblyai, whisper, google
TTS_PROVIDER=elevenlabs # or: cartesia, google, piperTo add a new provider:
- Create
voice/app/stt/your_provider.pyimplementingBaseSTTEngine - Add an
elifbranch invoice/app/stt/factory.py - Add settings in
voice/app/core/config.py