asr-claw
Speech recognition CLI for AI agent automation. Transcribe audio streams from stdin, files, or URLs with multiple ASR engines — local and cloud.
Triggers
- - User wants to transcribe audio, speech, or voice to text
- User needs speech recognition or ASR
- User wants to convert audio/voice recordings to text
- User wants to monitor live audio / livestream speech
- User asks about 语音识别、语音转文字、转写、直播语音
- adb-claw audio capture output needs to be transcribed
- User wants subtitles (SRT/VTT) generated from audio
Binary
The asr-claw binary is located at ${CLAUDE_PLUGIN_ROOT}/bin/asr-claw.
If it does not exist, the SessionStart hook will build or download it automatically.
Setup
Quick Start (Mac)
CODEBLOCK0
OpenClaw Setup
After installing the skill via ClawHub, configure settings:
CODEBLOCK1
Settings are stored in ~/.asr-claw/config.yaml:
CODEBLOCK2
Cloud Engines (no local model needed)
CODEBLOCK3
Commands
transcribe — Core: audio to text
CODEBLOCK4
Flags:
| Flag | Default | Description |
|---|
| INLINECODE3 | stdin | Input audio file |
| INLINECODE4 |
false | Streaming mode (real-time) |
|
--lang <code> | zh | Language code |
|
--engine <name> | auto | ASR engine |
|
--format <fmt> | json | Output: json, text, srt, vtt |
|
--chunk <sec> | 0 | Fixed-time chunking (disables VAD) |
|
--rate <hz> | 16000 | Sample rate for raw PCM input |
engines — Manage ASR engines
CODEBLOCK5
doctor — Environment check
CODEBLOCK6
Engine Matrix
| Engine | Type | Mac | GPU | Streaming | Install |
|---|
| qwen-asr | Local CLI | Yes | No (Accelerate) | VAD | INLINECODE10 |
| qwen3-asr |
vLLM Service | No | Yes (CUDA) | Native |
engines start qwen3-asr |
|
whisper | Local CLI | Yes | No | VAD | Manual |
|
doubao | Cloud API | Yes | — | No | Set DOUBAO
APIKEY |
|
openai | Cloud API | Yes | — | No | Set OPENAI
APIKEY |
|
deepgram | Cloud API | Yes | — | Native | Set DEEPGRAM
APIKEY |
Output Format
All commands output JSON envelope:
CODEBLOCK7
Use -o text for plain text, -o quiet for silent.
With adb-claw
CODEBLOCK8
asr-claw
面向AI智能体自动化的语音识别命令行工具。支持从标准输入、文件或URL转录音频流,兼容多种本地及云端ASR引擎。
触发条件
- - 用户需要将音频、语音或声音转写为文本
- 用户需要语音识别或ASR功能
- 用户需要将音频/语音录音转换为文本
- 用户需要监控实时音频/直播语音
- 用户询问关于语音识别、语音转文字、转写、直播语音
- adb-claw音频捕获输出需要转写
- 用户需要从音频生成字幕(SRT/VTT格式)
二进制文件
asr-claw二进制文件位于${CLAUDEPLUGINROOT}/bin/asr-claw。
如果该文件不存在,SessionStart钩子将自动构建或下载它。
安装配置
快速开始(Mac)
bash
安装qwen-asr引擎(编译C二进制文件+下载0.6B模型约1.9GB)
asr-claw engines install qwen-asr
验证
asr-claw engines list
asr-claw doctor
OpenClaw配置
通过ClawHub安装技能后,配置相关设置:
bash
设置默认语言(默认:zh)
claw config set asr-claw.default_lang en
使用更大模型
claw config set asr-claw.model Qwen/Qwen3-ASR-1.7B
中国用户——设置HuggingFace镜像
claw config set asr-claw.hf_mirror https://hf-mirror.com
自定义模型路径(例如共享NAS)
claw config set asr-claw.model_path /mnt/models/Qwen3-ASR-0.6B
更改模型设置后重新运行安装
asr-claw engines install qwen-asr
设置存储在~/.asr-claw/config.yaml中:
yaml
default:
engine: qwen-asr
lang: zh
format: json
engines:
qwen-asr:
binary: ~/.asr-claw/bin/qwen-asr
model_path: ~/.asr-claw/models/Qwen3-ASR-0.6B
云端引擎(无需本地模型)
bash
OpenAI Whisper API
export OPENAI
APIKEY=sk-...
asr-claw transcribe --file audio.wav --engine openai
火山引擎豆包
export DOUBAO
APIKEY=...
asr-claw transcribe --file audio.wav --engine doubao
Deepgram(原生流式支持)
export DEEPGRAM
APIKEY=...
asr-claw transcribe --file audio.wav --engine deepgram
命令
transcribe — 核心功能:音频转文字
bash
文件转录
asr-claw transcribe --file meeting.wav --lang zh
从标准输入管道输入
cat audio.wav | asr-claw transcribe --lang zh
流式转录(实时,来自adb-claw或ffmpeg)
adb-claw audio capture --stream --duration 60000 | asr-claw transcribe --stream --lang zh
字幕输出
asr-claw transcribe --file lecture.wav --format srt > lecture.srt
asr-claw transcribe --file lecture.wav --format vtt > lecture.vtt
指定引擎
asr-claw transcribe --file audio.wav --engine whisper --lang en
参数说明:
| 参数 | 默认值 | 描述 |
|---|
| --file <路径> | 标准输入 | 输入音频文件 |
| --stream |
false | 流式模式(实时) |
| --lang <代码> | zh | 语言代码 |
| --engine <名称> | auto | ASR引擎 |
| --format <格式> | json | 输出格式:json, text, srt, vtt |
| --chunk <秒> | 0 | 固定时间分片(禁用VAD) |
| --rate <赫兹> | 16000 | 原始PCM输入的采样率 |
engines — 管理ASR引擎
bash
asr-claw engines list # 列出所有引擎及状态
asr-claw engines install qwen-asr # 安装本地引擎(Mac)
asr-claw engines info qwen-asr # 引擎详细信息
asr-claw engines start qwen3-asr # 启动vLLM服务引擎
asr-claw engines stop qwen3-asr # 停止服务引擎
asr-claw engines status # 查看运行中的引擎
doctor — 环境检查
bash
asr-claw doctor # 检查平台、引擎、依赖项
引擎矩阵
| 引擎 | 类型 | Mac | GPU | 流式支持 | 安装方式 |
|---|
| qwen-asr | 本地CLI | 支持 | 不支持(加速) | VAD | engines install qwen-asr |
| qwen3-asr |
vLLM服务 | 不支持 | 支持(CUDA) | 原生 | engines start qwen3-asr |
|
whisper | 本地CLI | 支持 | 不支持 | VAD | 手动安装 |
|
doubao | 云API | 支持 | — | 不支持 | 设置DOUBAO
APIKEY |
|
openai | 云API | 支持 | — | 不支持 | 设置OPENAI
APIKEY |
|
deepgram | 云API | 支持 | — | 原生 | 设置DEEPGRAM
APIKEY |
输出格式
所有命令输出JSON格式:
json
{
ok: true,
command: transcribe,
data: {
segments: [{index: 0, start: 0.0, end: 2.5, text: ...}],
full_text: ...,
engine: qwen-asr,
audiodurationsec: 5.5
},
duration_ms: 1230,
timestamp: 2026-03-13T10:00:00Z
}
使用-o text获取纯文本,-o quiet获取静默输出。
与adb-claw配合使用
bash
从Android设备实时转录
adb-claw audio capture --stream --duration 60000 | asr-claw transcribe --stream --lang zh
先录制再转录
adb-claw audio capture --duration 30000 --file recording.wav
asr-claw transcribe --file recording.wav --lang zh
同时保存音频并转录
adb-claw audio capture --stream --duration 0 | tee backup.wav | asr-claw transcribe --stream