Feishu Voice Loop
Provide a reusable three-step voice loop for OpenClaw:
- 1. accept text or voice input
- generate speech with OpenAI TTS
- return the audio to Feishu or a web player
When the input is voice, transcribe it to text first, then continue through the same output pipeline.
Quick start
Prerequisites:
- -
OPENAI_API_KEY is set for TTS - Feishu app credentials exist in
~/.openclaw/openclaw.json under channels.feishu.appId/appSecret, or are passed explicitly - INLINECODE3 and
ffprobe are installed and available - local audio transcription is configured in
~/.openclaw/openclaw.json under INLINECODE6
Main scripts:
Tasks
1. Transcribe voice input
Use this when you have a local .ogg, .opus, .wav, or similar file and want text.
CODEBLOCK0
This script reuses the existing Whisper CLI configuration from ~/.openclaw/openclaw.json.
2. Generate and send voice output
Use this when you already have text and want to send a Feishu voice message.
CODEBLOCK1
The script will:
- 1. call OpenAI INLINECODE13
- save WAV audio temporarily
- convert to Feishu-friendly Opus via INLINECODE14
- upload the file to Feishu
- send an
audio message to the target INLINECODE16
3. Run the full voice loop
Use this skill when the goal is a reusable voice interaction pipeline:
- 1. transcribe input audio to text
- decide or generate the reply text
- synthesize reply audio with OpenAI TTS
- send the reply back to Feishu
Read references/input-output-workflow.md when building or explaining the end-to-end loop.
Default output style
Default preset is stored in references/presets.md.
Unless the user asks otherwise, use:
- - model: INLINECODE19
- voice: INLINECODE20
- default style: 年轻日系男声感、温柔里带一点撩、贴耳边私聊感、自然、不播音腔
When the user asks for a different flavor, either:
- - pass a custom INLINECODE21
- or adapt one of the presets in INLINECODE22
Handle failures
Common failure cases:
- -
Missing OPENAI_API_KEY → ask for API key / env setup - HTTP 429 from OpenAI → billing or quota issue
- missing Feishu app credentials → configure INLINECODE24
- missing
ffmpeg or ffprobe → install locally before retrying - missing transcription model config → configure INLINECODE27
When OpenAI billing is not enabled, say so directly instead of pretending the voice was generated.
Packaging and sharing
Package with:
CODEBLOCK2
The resulting .skill file can be shared or uploaded wherever the user distributes skills.
Resources
scripts/openaittsfeishu.py
Use for deterministic TTS generation and Feishu delivery.
scripts/transcribe_audio.py
Use for deterministic local audio transcription via the configured Whisper CLI.
references/presets.md
Read when the user asks for a different voice direction or wants named presets.
references/input-output-workflow.md
Read when packaging or explaining the complete voice-in / voice-out solution.
Feishu 语音循环
为 OpenClaw 提供可复用的三步语音循环:
- 1. 接收文本或语音输入
- 使用 OpenAI TTS 生成语音
- 将音频返回给 飞书 或 网页播放器
当输入为语音时,先将其转录为文本,然后继续通过相同的输出管道。
快速开始
前置条件:
- - 已设置 OPENAIAPIKEY 用于 TTS
- 飞书应用凭证存在于 ~/.openclaw/openclaw.json 的 channels.feishu.appId/appSecret 下,或显式传入
- 已安装 ffmpeg 和 ffprobe 并可用
- 本地音频转录已在 ~/.openclaw/openclaw.json 的 tools.media.audio.models 下配置
主要脚本:
- - scripts/openaittsfeishu.py
- scripts/transcribe_audio.py
任务
1. 转录语音输入
当你有本地的 .ogg、.opus、.wav 或类似文件并需要文本时使用。
bash
python3 scripts/transcribe_audio.py /path/to/input.ogg
此脚本复用 ~/.openclaw/openclaw.json 中现有的 Whisper CLI 配置。
2. 生成并发送语音输出
当你已有文本并希望发送飞书语音消息时使用。
bash
python3 scripts/openaittsfeishu.py \
--to openid> \
--text 这条是语音测试。 \
--voice alloy \
--model gpt-4o-mini-tts
该脚本将:
- 1. 调用 OpenAI audio/speech
- 临时保存 WAV 音频
- 通过 ffmpeg 转换为飞书友好的 Opus 格式
- 将文件上传到飞书
- 向目标 open_id 发送 audio 消息
3. 运行完整语音循环
当目标是可复用的语音交互管道时使用此技能:
- 1. 将输入音频转录为文本
- 决定或生成回复文本
- 使用 OpenAI TTS 合成回复音频
- 将回复发送回飞书
在构建或解释端到端循环时,请阅读 references/input-output-workflow.md。
默认输出风格
默认预设存储在 references/presets.md 中。
除非用户另有要求,否则使用:
- - 模型:gpt-4o-mini-tts
- 音色:alloy
- 默认风格:年轻日系男声感、温柔里带一点撩、贴耳边私聊感、自然、不播音腔
当用户要求不同风格时,可以:
- - 传入自定义 --instructions
- 或调整 references/presets.md 中的某个预设
处理故障
常见故障情况:
- - 缺少 OPENAIAPIKEY → 请求提供 API 密钥/环境设置
- OpenAI 返回 HTTP 429 → 计费或配额问题
- 缺少飞书应用凭证 → 配置 channels.feishu.appId/appSecret
- 缺少 ffmpeg 或 ffprobe → 在重试前本地安装
- 缺少转录模型配置 → 配置 tools.media.audio.models
当 OpenAI 计费未启用时,直接说明,而不是假装生成了语音。
打包与分享
使用以下命令打包:
bash
python3 /Users/zoepeng/.openclaw/lib/nodemodules/openclaw/skills/skill-creator/scripts/packageskill.py \
/Users/zoepeng/.openclaw/workspace/skills/openai-feishu-voice
生成的 .skill 文件可以分享或上传到用户分发技能的任何地方。
资源
scripts/openaittsfeishu.py
用于确定性的 TTS 生成和飞书投递。
scripts/transcribe_audio.py
用于通过配置的 Whisper CLI 进行确定性的本地音频转录。
references/presets.md
当用户要求不同的语音方向或想要命名预设时阅读。
references/input-output-workflow.md
在打包或解释完整的语音输入/语音输出解决方案时阅读。