tts-router — Local TTS Router for Apple Silicon
A CLI that manages and serves multiple TTS models locally on Apple Silicon (MLX).
Models are downloaded from HuggingFace Hub and served via OpenAI + DashScope compatible APIs.
Prerequisites
- - macOS with Apple Silicon (M1/M2/M3/M4)
- INLINECODE0 installed — see https://docs.astral.sh/uv/getting-started/installation/
(e.g.
brew install uv or via the official installer)
- - ffmpeg installed (
brew install ffmpeg)
Install
CODEBLOCK0
Commands
tts-router list — Show available models
CODEBLOCK1
tts-router pull <model> — Download model weights
CODEBLOCK2
Models are cached in ~/.cache/huggingface/hub/. No need to re-download.
tts-router serve — Start the TTS API server
CODEBLOCK3
The server requires models to be pulled first.
tts-router say — Synthesize speech from CLI
CODEBLOCK4
Available Models
| Short Name | Features |
|---|
| INLINECODE8 | multi-speaker, emotion, instruct (default) |
| INLINECODE9 |
free-form voice description |
|
qwen3-tts-clone | voice cloning with ref audio |
|
kokoro | fast, lightweight, multi-lang |
|
dia | multi-speaker dialogue, laughter/emotion sounds |
|
chatterbox | 23 languages, emotion control, voice cloning |
|
orpheus | emotive TTS with emotion tags |
Quick Start for Agent
CODEBLOCK5
API Endpoints (when serving)
| Endpoint | Method | Description |
|---|
| INLINECODE15 | GET | Playground UI |
| INLINECODE16 |
POST | OpenAI-compatible TTS |
|
GET /v1/audio/voices | GET | List available voices |
|
GET /health | GET | Health check |
|
POST /v1/audio/clone | POST | Voice clone generation |
|
POST /v1/audio/references/upload | POST | Upload reference audio |
|
POST /v1/audio/references/from-url | POST | Fetch ref audio by URL |
Advanced Use Cases
For more complex workflows, read the relevant reference file:
- - Clone a voice from any URL (YouTube, Bilibili, podcast, direct audio link) →
read
references/voice-cloning.md
- - Use tts-router as a TTS provider in OpenClaw →
read INLINECODE23
tts-router — Apple Silicon 本地 TTS 路由器
一个在 Apple Silicon (MLX) 上本地管理和服务多个 TTS 模型的 CLI 工具。
模型从 HuggingFace Hub 下载,并通过兼容 OpenAI + DashScope 的 API 提供服务。
前置条件
- - 搭载 Apple Silicon (M1/M2/M3/M4) 的 macOS
- 已安装 uv — 参见 https://docs.astral.sh/uv/getting-started/installation/
(例如 brew install uv 或通过官方安装程序)
- - 已安装 ffmpeg (brew install ffmpeg)
安装
bash
从 PyPI 安装(由于 mlx-audio 上游依赖,需要 --prerelease=allow)
uvx --prerelease=allow tts-router list
或使用 pip 安装
pip install tts-router
命令
tts-router list — 显示可用模型
bash
tts-router list
tts-router pull — 下载模型权重
bash
tts-router pull qwen3-tts
tts-router pull kokoro
模型缓存于 ~/.cache/huggingface/hub/。无需重新下载。
tts-router serve — 启动 TTS API 服务器
bash
默认:qwen3-tts 在端口 8091
tts-router serve
自定义模型和端口
tts-router serve --model kokoro --port 9000
服务器需要先拉取模型。
tts-router say — 从 CLI 合成语音
bash
tts-router say Hello world -o hello.wav
tts-router say Hello --voice Vivian --model kokoro -o out.wav
可用模型
| 短名称 | 特性 |
|---|
| qwen3-tts | 多说话人、情感、指令(默认) |
| qwen3-tts-design |
自由形式的语音描述 |
| qwen3-tts-clone | 使用参考音频进行语音克隆 |
| kokoro | 快速、轻量、多语言 |
| dia | 多说话人对话、笑声/情感音效 |
| chatterbox | 23种语言、情感控制、语音克隆 |
| orpheus | 带情感标签的情感化 TTS |
快速入门(面向 Agent)
bash
1. 拉取默认模型
tts-router pull qwen3-tts
2. 启动服务器
tts-router serve
3. 生成语音(OpenAI 格式)
curl -X POST http://localhost:8091/v1/audio/speech \
-H Content-Type: application/json \
-d {input: Hello world, voice: Vivian} \
--output output.wav
API 端点(服务运行时)
| 端点 | 方法 | 描述 |
|---|
| GET / | GET | Playground UI |
| POST /v1/audio/speech |
POST | 兼容 OpenAI 的 TTS |
| GET /v1/audio/voices | GET | 列出可用语音 |
| GET /health | GET | 健康检查 |
| POST /v1/audio/clone | POST | 语音克隆生成 |
| POST /v1/audio/references/upload | POST | 上传参考音频 |
| POST /v1/audio/references/from-url| POST | 通过 URL 获取参考音频 |
高级用例
对于更复杂的工作流程,请阅读相关参考文件:
- - 从任意 URL 克隆语音(YouTube、Bilibili、播客、直接音频链接)→
阅读 references/voice-cloning.md
- - 在 OpenClaw 中使用 tts-router 作为 TTS 提供商 →
阅读 references/openclaw.md