WebChat Voice GUI
Voice input GUI for OpenClaw WebChat Control UI:
- - Mic button with idle/recording/processing states
- Real-time VU meter: button shadow/scale reacts to voice level
- Push-to-Talk: hold mic button to record, release to send (default mode)
- Toggle mode: click to start, click to stop (switch via double-click on mic button)
- Keyboard shortcuts:
Ctrl+Space Push-to-Talk, Ctrl+Shift+M start/stop continuous recording, Ctrl+Shift+B live transcription [beta] - Localized UI: auto-detects browser language (English, German, Chinese built-in), customizable
- Gateway startup hook re-injects script after INLINECODE3
Prerequisites
- 1.
webchat-https-proxy — HTTPS/WSS reverse proxy must be deployed and running. faster-whisper-local-service — Local STT backend on port 18790.
Verify:
CODEBLOCK0
Deploy
CODEBLOCK1
With language override:
CODEBLOCK2
When run interactively without VOICE_LANG, the script will ask you to choose a UI language.
This script is idempotent.
Quick verify
CODEBLOCK3
Security Notes
Client-side JS (voice-input.js)
- - No dynamic code execution: No
eval(), new Function(), or innerHTML with user data. - HTTPS-first: Transcription requests use same-origin
/transcribe when served over HTTPS. Only falls back to http://127.0.0.1:18790 in local dev. - No external servers: Audio is never sent outside the local machine.
- No token scraping: Client JS does not read gateway auth from browser storage.
/transcribe is accepted via same-origin browser requests; Bearer auth remains optional fallback at the proxy. - Uses
textContent for all toast messages (no XSS vector). - Bounded memory: Continuous recording mode enforces a 120-chunk limit (~2 minutes), preventing unbounded memory growth.
Deployment scripts
- - Language input validated:
VOICE_LANG must match ^([a-zA-Z]{2,5}(-[a-zA-Z]{2,5})?|auto)$ — prevents injection via sed. - Robust path detection: All scripts validate Control UI directory exists before modifying files.
- Gateway hook: Uses
execFileSync with array args — no shell interpolation. Script path derived from __dirname, not user input. - Idempotent: All scripts safe to run repeatedly.
No data exfiltration
- - No outbound network calls from JS or scripts.
- No telemetry, analytics, or tracking.
What this skill modifies
| What | Path | Action |
|---|
| Control UI HTML | INLINECODE19 | Adds <script> tag for voice-input.js |
| Control UI asset |
<npm-global>/openclaw/dist/control-ui/assets/voice-input.js | Copies mic button JS |
| Gateway hook |
~/.openclaw/hooks/voice-input-inject/ | Installs startup hook that re-injects JS after updates |
| Workspace files |
~/.openclaw/workspace/voice-input/ | Copies voice-input.js, i18n.json |
Mic Button Controls
| Action | Effect |
|---|
| Hold (PTT mode) | Record while held, transcribe on release |
| Click (Toggle mode) |
Start recording / stop and transcribe |
|
Double-click | Switch between PTT and Toggle mode |
|
Right-click | Toggle beep sound on/off |
|
Ctrl+Space (hold) | Push-to-Talk via keyboard |
|
Ctrl+Shift+M | Start/stop recording |
|
Ctrl+Shift+B | Start/stop live transcription [beta] |
Language / i18n
Auto-detects browser language. Built-in: English (en), German (de), Chinese (zh).
Override in browser console:
CODEBLOCK4
See assets/i18n.json for all translation keys.
Uninstall
CODEBLOCK5
This removes the UI injection, hook, and workspace files. Does not touch the HTTPS proxy or faster-whisper backend — uninstall those separately.
WebChat 语音界面
用于 OpenClaw WebChat 控制界面的语音输入图形界面:
- - 麦克风按钮,支持空闲/录音/处理三种状态
- 实时音量表:按钮阴影/缩放随语音电平动态变化
- 按住说话:按住麦克风按钮录音,松开发送(默认模式)
- 切换模式:点击开始录音,再次点击停止(双击麦克风按钮切换)
- 键盘快捷键:Ctrl+Space 按住说话,Ctrl+Shift+M 开始/停止连续录音,Ctrl+Shift+B 实时转写 [测试版]
- 本地化界面:自动检测浏览器语言(内置英语、德语、中文),可自定义
- 网关启动钩子:在 openclaw update 后重新注入脚本
前置条件
- 1. webchat-https-proxy — 必须部署并运行 HTTPS/WSS 反向代理。
- faster-whisper-local-service — 本地语音转文字后端,运行在 18790 端口。
验证:
bash
systemctl --user is-active openclaw-voice-https.service
systemctl --user is-active openclaw-transcribe.service
部署
bash
bash scripts/deploy.sh
指定语言:
bash
VOICE_LANG=de bash scripts/deploy.sh
当交互式运行且未设置 VOICE_LANG 时,脚本会提示选择界面语言。
此脚本具有幂等性。
快速验证
bash
bash scripts/status.sh
安全说明
客户端 JS(voice-input.js)
- - 无动态代码执行:不使用 eval()、new Function() 或包含用户数据的 innerHTML。
- HTTPS 优先:通过 HTTPS 提供服务时,转写请求使用同源 /transcribe。仅在本地开发时回退到 http://127.0.0.1:18790。
- 无外部服务器:音频数据不会发送到本地机器之外。
- 无令牌窃取:客户端 JS 不从浏览器存储中读取网关认证信息。/transcribe 通过同源浏览器请求接受;Bearer 认证作为代理的可选回退方案。
- 所有提示消息使用 textContent(无 XSS 攻击向量)。
- 内存限制:连续录音模式强制 120 个数据块限制(约 2 分钟),防止无限制内存增长。
部署脚本
- - 语言输入验证:VOICELANG 必须匹配 ^([a-zA-Z]{2,5}(-[a-zA-Z]{2,5})?|auto)$ — 防止通过 sed 注入。
- 路径检测稳健:所有脚本在修改文件前验证控制界面目录是否存在。
- 网关钩子:使用 execFileSync 配合数组参数 — 无 shell 插值。脚本路径来自 _dirname,而非用户输入。
- 幂等性:所有脚本可安全重复运行。
无数据泄露
- - JS 或脚本无出站网络调用。
- 无遥测、分析或跟踪功能。
此技能修改的内容
| 内容 | 路径 | 操作 |
|---|
| 控制界面 HTML | <npm-global>/openclaw/dist/control-ui/index.html | 添加 voice-input.js 的 <script> 标签 |
| 控制界面资源 |
/openclaw/dist/control-ui/assets/voice-input.js | 复制麦克风按钮 JS |
| 网关钩子 | ~/.openclaw/hooks/voice-input-inject/ | 安装启动钩子,在更新后重新注入 JS |
| 工作区文件 | ~/.openclaw/workspace/voice-input/ | 复制 voice-input.js、i18n.json |
麦克风按钮控制
| 操作 | 效果 |
|---|
| 按住(按住说话模式) | 按住时录音,松开后转写 |
| 点击(切换模式) |
开始录音 / 停止并转写 |
| 双击 | 在按住说话和切换模式之间切换 |
| 右键点击 | 开启/关闭提示音 |
| Ctrl+Space(按住) | 通过键盘按住说话 |
| Ctrl+Shift+M | 开始/停止录音 |
| Ctrl+Shift+B | 开始/停止实时转写 [测试版] |
语言 / 国际化
自动检测浏览器语言。内置语言:英语(en)、德语(de)、中文(zh)。
在浏览器控制台中覆盖:
js
localStorage.setItem(oc-voice-lang, de); // 强制使用德语
localStorage.removeItem(oc-voice-lang); // 恢复自动检测
所有翻译键值见 assets/i18n.json。
卸载
bash
bash scripts/uninstall.sh
此操作会移除界面注入、钩子和工作区文件。不会影响 HTTPS 代理或 faster-whisper 后端——请单独卸载这些组件。