Gemini Live Phone Bridge
Real-time voice AI over phone calls using Google Gemini's native audio capabilities.
Architecture
CODEBLOCK0
Quick Start
CODEBLOCK1
Endpoints
| Endpoint | Method | Description |
|---|
| INLINECODE0 | GET | Health check + active calls |
| INLINECODE1 |
POST | TwiML for inbound calls (Twilio webhook) |
|
/gemini-live/stream | WS | Twilio Media Stream WebSocket |
|
/gemini-live/call | POST | Initiate outbound call |
|
/gemini-live/twiml | POST | TwiML for outbound calls |
|
/gemini-live/call-status | POST | Twilio call status webhook |
Outbound Call API
CODEBLOCK2
Configuration
All settings via CLI args or environment variables:
Core
- -
--model — Gemini model (default: gemini-2.5-flash-native-audio-latest) - INLINECODE8 — Gemini voice: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr (default: Kore)
- INLINECODE9 — Twilio outbound number (default: env
TWILIO_FROM) - INLINECODE11 — AI persona system prompt
- INLINECODE12 — Max call seconds (default: 300)
VAD (Voice Activity Detection)
- -
--vad-enabled / --no-vad — Toggle server-side VAD (default: on) - INLINECODE15 — Silence duration to trigger activityEnd (default: 500)
- INLINECODE16 — RMS energy threshold (default: 0.01)
- INLINECODE17 — Min speech duration before activityStart (default: 100)
Echo Suppression
- -
--echo-multiplier — VAD threshold multiplier during agent speech (default: 3.0) - INLINECODE19 — Decay time after agent stops speaking (default: 300)
Twilio Setup
- 1. Buy a phone number on Twilio
- Set Voice webhook:
https://your-domain/gemini-live/incoming (HTTP POST) - Set Call status URL:
https://your-domain/gemini-live/call-status (HTTP POST) - Ensure geo-permissions are enabled for target countries
Network Requirements
The bridge must be accessible from the internet (Twilio connects to it).
Recommended: Caddy reverse proxy with WebSocket support.
CODEBLOCK3
Performance
Latency benchmarks (Gemini 2.5 Flash Native Audio):
| Config | Median | Min | Max |
|---|
| No VAD, 200ms buffer | 3,660ms | 2,360ms | 5,180ms |
| Server VAD, 50ms buffer |
2,500ms |
2,080ms | 6,980ms |
Server-side VAD reduces median latency by ~32%.
Gemini Live Phone Bridge
使用Google Gemini原生音频能力,通过电话实现实时语音AI交互。
架构
电话 ↔ Twilio ↔ WebSocket(μ律8kHz)↔ 桥接层(PCM转码)↔ Gemini Live API(24kHz PCM)
快速开始
bash
设置所需环境变量
export GOOGLE
APIKEY=your-key
export TWILIO
AUTHTOKEN=your-token
运行桥接服务
python scripts/bridge.py --port 3335
端点
| 端点 | 方法 | 描述 |
|---|
| /gemini-live/status | GET | 健康检查 + 活跃通话 |
| /gemini-live/incoming |
POST | 入站通话的TwiML(Twilio Webhook) |
| /gemini-live/stream | WS | Twilio媒体流WebSocket |
| /gemini-live/call | POST | 发起出站通话 |
| /gemini-live/twiml | POST | 出站通话的TwiML |
| /gemini-live/call-status | POST | Twilio通话状态Webhook |
出站通话API
bash
curl -X POST https://your-domain/gemini-live/call \
-H Content-Type: application/json \
-d {to: +1234567890, greeting: 你好!我是玛西亚。}
配置
所有设置可通过CLI参数或环境变量配置:
核心配置
- - --model — Gemini模型(默认:gemini-2.5-flash-native-audio-latest)
- --voice — Gemini语音:Puck、Charon、Kore、Fenrir、Aoede、Leda、Orus、Zephyr(默认:Kore)
- --from-number — Twilio出站号码(默认:环境变量TWILIO_FROM)
- --system-prompt — AI角色系统提示词
- --max-duration — 最大通话秒数(默认:300)
VAD(语音活动检测)
- - --vad-enabled / --no-vad — 切换服务端VAD(默认:开启)
- --vad-silence-ms — 触发活动结束的静音时长(默认:500)
- --vad-energy-threshold — RMS能量阈值(默认:0.01)
- --vad-speech-min-ms — 触发活动开始前的最短语音时长(默认:100)
回声抑制
- - --echo-multiplier — 代理语音期间的VAD阈值倍数(默认:3.0)
- --echo-decay-ms — 代理停止说话后的衰减时间(默认:300)
Twilio设置
- 1. 在Twilio购买电话号码
- 设置语音Webhook:https://your-domain/gemini-live/incoming(HTTP POST)
- 设置通话状态URL:https://your-domain/gemini-live/call-status(HTTP POST)
- 确保目标国家/地区已启用地理权限
网络要求
桥接服务必须可从互联网访问(Twilio需连接至此服务)。
推荐:使用支持WebSocket的Caddy反向代理。
Caddy配置示例
handle /gemini-live/* {
reverse_proxy localhost:3335 {
flush_interval -1
transport http {
read_timeout 0
write_timeout 0
}
}
}
性能表现
延迟基准测试(Gemini 2.5 Flash Native Audio):
| 配置 | 中位数 | 最小值 | 最大值 |
|---|
| 无VAD,200ms缓冲区 | 3,660ms | 2,360ms | 5,180ms |
| 服务端VAD,50ms缓冲区 |
2,500ms |
2,080ms | 6,980ms |
服务端VAD可将中位数延迟降低约32%。