Voice

Overview

Empfio's voice channel answers inbound calls using a real-time speech pipeline. Callers speak naturally; the AI agent transcribes their speech, processes it through the same booking flow, and speaks the reply back — all in real time.

The voice service runs as a separate Docker container (voice/) alongside the main backend.

Telephony provider: Telnyx (primary) — German local numbers from €0.86/month, 17× cheaper than Twilio.

How it works

Customer calls the Empfio AI phone number (provisioned via Telnyx)
Telnyx webhook — Telnyx sends the call event to the voice service
Speech-to-text — the caller's speech is transcribed in real time via Deepgram Nova-3
Agent processing — the transcript is sent to the LangGraph AI agent (same agent as text channels)
Text-to-speech — the agent's text reply is converted to speech via ElevenLabs and streamed back to the caller
Booking confirmation — when a booking is made, a confirmation SMS is sent to the caller's phone number

Setup

Voice is provisioned through Empfio's AI Number feature, which provides a single phone number that handles voice, SMS, and WhatsApp:

Go to Settings → AI Number in the Empfio dashboard
Provision a new AI phone number
The number is automatically configured for voice calls, SMS, and WhatsApp
Test by calling the number

Speech pipeline

Stage	Provider	Notes
Telephony	Telnyx	TeXML + Media Stream WebSocket, PCMU/8000
Speech-to-text	Deepgram Nova-3	~100ms latency, accepts mulaw natively
AI agent	LangGraph (GPT-4o)	Same agent as WhatsApp/Telegram
Text-to-speech	ElevenLabs Turbo v2.5	~200ms first-byte latency

Streaming for low latency

Voice conversations require low latency to feel natural. The agent uses streaming mode (/chat/stream) so the first words of the reply are spoken before the full response is generated. This significantly reduces the perceived wait time.

Typical latency breakdown:

Stage	Time
Speech-to-text (STT)	~100ms
LLM first token	~500ms
Text-to-speech (TTS)	~200ms
Total perceived delay	~800ms

Barge-in

When a caller speaks while the AI is talking, Empfio immediately stops playback and listens. This makes conversations feel natural rather than robotic. Barge-in is enabled by default and can be controlled via ENABLE_BARGE_IN in the voice service configuration.

Limitations

Voice requires a Telnyx account and a provisioned AI Number
Conference calls and multi-party calls are not supported
The agent cannot process DTMF (keypad/tone input) — speech only
Voice recognition works best in English and German

Troubleshooting

Problem	Fix
No audio when calling	Check that the Telnyx TeXML Application voice_url points to the voice service
Agent not responding	Verify the voice service is running (`GET /health`) and `TELNYX_API_KEY` is set
Wrong language in replies	Check your organization's language setting in Settings → General
Long pauses before replies	High LLM latency — check the agent service logs
Call drops after a few seconds	Verify the Telnyx number is active and the TeXML Application is correctly assigned

Voice

On this page