Voice Provider Costs

Telephony, speech-to-text, and text-to-speech pricing comparison for the Empfio voice channel.

Overview

Running an AI voice agent involves three cost layers: telephony (the phone number and call minutes), speech-to-text (transcribing the caller), and text-to-speech (speaking the AI's reply). Empfio selects providers at each layer for the best cost-to-quality ratio for European SMEs.

Telephony — Phone Numbers

Empfio uses Telnyx as its primary telephony provider, replacing Twilio.

Number monthly costs

Provider	German local	German mobile	US local	Notes
Telnyx (default)	€0.86/mo	€0.65/mo	$1.15/mo	Primary provider
Twilio	~€15/mo	~€15/mo	$1.15/mo	Legacy / fallback
Vonage	~€3/mo	~€5/mo	~$1/mo
Sinch	~€3/mo	—	~$1/mo

Telnyx German local numbers are 17× cheaper than Twilio for the same capability (voice + SMS).

Inbound call per-minute rates (Telnyx)

Type	Rate
Inbound voice	$0.002/min
Outbound voice (US)	$0.01/min
Inbound SMS	$0.004/message
Outbound SMS	$0.006/message

Speech-to-Text (STT)

Converts the caller's speech to text in real time.

Provider	Price/min	Streaming	Notes
Deepgram Nova-3 (default)	$0.0077/min ($0.46/hr)	Yes	Accepts mulaw/8000 natively, no audio conversion needed
AssemblyAI Universal-Streaming	$0.0025/min ($0.15/hr)	Yes	Cheapest real-time option, 3× cheaper than Deepgram
OpenAI Whisper API	$0.006/min	No	Cheapest overall but batch-only — not suitable for real-time calls
Google Cloud STT (standard)	$0.024/min	Yes	15-second billing rounding
Azure Speech-to-Text	$0.017/min	Yes
Telnyx STT (in-house)	$0.025/min	Yes	3× more expensive than Deepgram
Telnyx STT (Google engine)	$0.050/min	Yes	Most expensive option

Current choice: Deepgram Nova-3 — best balance of accuracy, real-time streaming latency (~100ms), and cost. Accepts mulaw/8000 audio natively so no conversion step is needed.

Best cost alternative: AssemblyAI Universal-Streaming at $0.15/hr is 3× cheaper than Deepgram with comparable real-time performance. Can be enabled via STT_PROVIDER=assemblyai in the voice service (requires adding the provider implementation).

Text-to-Speech (TTS)

Converts the AI agent's text reply to audio and streams it to the caller.

Provider	Price/1K chars	Latency	Voice quality
ElevenLabs Turbo v2.5 (default)	~$0.10/1K chars	~200ms	Best
Cartesia Sonic 3	$0.030/1K chars	Very low	Excellent — purpose-built for voice agents
Google Cloud Neural2	$0.016/1K chars	~300ms	Good
Azure Neural HD	$0.015/1K chars	~300ms	Good
Telnyx TTS (Azure HD bundled)	$0.045/1K chars	—	Same as Azure at higher cost
OpenAI TTS-1	$0.015/1K chars	~300ms	Average

Current choice: ElevenLabs Turbo v2.5 — most natural voice quality, purpose-built for low-latency streaming. Cost is higher but quality difference is clearly audible to callers, which matters for SME trust.

Best cost alternative: Cartesia Sonic 3 at $0.030/1K chars — 3× cheaper than ElevenLabs, built specifically for real-time voice agents with very low latency. Can be enabled via TTS_PROVIDER=cartesia.

Cost per call — example

Assumptions: 5-minute call, 200 words spoken by caller (~1,200 chars), 150 words spoken by AI (~900 chars).

Component	Current stack	Optimized stack
Telnyx inbound (5 min)	$0.01	$0.01
STT — Deepgram (5 min)	$0.039	—
STT — AssemblyAI (5 min)	—	$0.013
TTS — ElevenLabs (900 chars)	$0.090	—
TTS — Cartesia (900 chars)	—	$0.027
Total per call	~$0.14	~$0.05

The optimized stack (AssemblyAI + Cartesia) is ~3× cheaper per call while maintaining good quality. ElevenLabs voices are noticeably more natural, which may be worth the premium for customer-facing businesses.

Switching providers

All STT and TTS providers are swappable via environment variables in the voice service. No code changes required for supported providers.

# .env (voice service)
STT_PROVIDER=deepgram       # or: assemblyai, whisper, google
TTS_PROVIDER=elevenlabs     # or: cartesia, google, piper

To add a new provider:

Create voice/app/stt/your_provider.py implementing BaseSTTEngine
Add an elif branch in voice/app/stt/factory.py
Add settings in voice/app/core/config.py

Voice Provider Costs

On this page