Article TTS Skill

Default Configuration

参数	默认值	说明
INLINECODE0	INLINECODE1	语言：`en` 或 INLINECODE3
INLINECODE4

Supported Languages

语言	OCR 语言包	TTS Voice
INLINECODE14	INLINECODE15（预装）	INLINECODE16
INLINECODE17

chi_sim（需安装） | zh-CN-XiaoxiaoNeural |

中文 OCR 语言包安装：

- Linux（WSL/Debian/Ubuntu）：INLINECODE20
macOS：brew install tesseract-lang（自带中文）
Windows：下载 chi_sim.traineddata 放入 Tesseract 安装目录的 tessdata 文件夹

Workflow

Input Types

- 图片：OCR 提取文字（需要 lang 指定语言）
纯文字：直接 TTS，无需 OCR

Standard Flow（默认，需确认）

CODEBLOCK0

Skip-Confirmation Flow ⚠️

用户说"不需要确认"或"直接生成"时，跳过确认步骤。

⚠️ 安全提示：skipConfirmation 会跳过文字确认步骤，OCR 提取的文本（可能包含敏感信息）会直接转为音频并发送。适用于可信来源、低敏感内容。建议默认关闭（skipConfirmation: false）。

OCR Step

CODEBLOCK1

CODEBLOCK2

TTS Step

全文字频

CODEBLOCK3

按句拆分（仅 splitSentences=true）

CODEBLOCK4

Output Directory

CODEBLOCK5

Sending via Message Channel

The agent detects the active channel from the runtime context and calls message(...) accordingly. No hardcoded channel — the agent uses whichever channel the user is currently chatting through.

CODEBLOCK6

Channel Behavior Notes

Channel	音频支持	备注
Feishu	✅	推荐使用 feishu-voice-send skill 发送语音消息
Telegram

If the channel does not support audio, the agent saves the file to OUTPUT_DIR and sends the file path as a text message instead.

如何发送为语音消息（而非附件）

重要说明： OpenClaw 内置的飞书媒体发送存在 bug（缺少 duration 参数），导致 .ogg 文件有时显示为附件而非语音消息。

推荐方案：使用 feishu-voice-send skill

该 skill 调用飞书官方 API，正确传递 duration 参数，确保语音消息正常显示。

方式一：通过 feishu-voice-send skill 发送

CODEBLOCK7

方式二：手动调用（不推荐）

如果必须使用 OpenClaw 内置的 message 工具，需要：

1. 将 mp3 转换为标准 Ogg Opus 格式
发送时必须带 message 参数
注意：即使带 message 参数，仍可能因为缺少 duration 而显示为附件

CODEBLOCK8

Available TTS Voices

English

en-US-EmmaNeural, en-US-BrianNeural, en-GB-LibbyNeural, ...

Chinese

zh-CN-XiaoxiaoNeural（女声）, zh-CN-YunxiNeural（男声）, zh-CN-YunyangNeural（新闻男声）, ...

查看完整列表：INLINECODE38

Notes

- Tesseract + English 预装；中文需 INLINECODE39
edge-tts 通过 uvx 运行，无需安装
图片质量直接影响 OCR 效果，尽量保持光线充足、角度端正

文章 TTS 技能

默认配置

参数	默认值	说明
lang	en	语言：en 或 zh
skipConfirmation

支持的语言

语言	OCR 语言包	TTS 声音
en	eng（预装）	en-US-EmmaNeural
zh

chi_sim（需安装） | zh-CN-XiaoxiaoNeural |

中文 OCR 语言包安装：

- Linux（WSL/Debian/Ubuntu）：apt-get install tesseract-ocr-chi-sim
macOS：brew install tesseract-lang（自带中文）
Windows：下载 chi_sim.traineddata 放入 Tesseract 安装目录的 tessdata 文件夹

工作流程

输入类型

- 图片：OCR 提取文字（需要 lang 指定语言）
纯文字：直接 TTS，无需 OCR

标准流程（默认，需确认）

图片 → OCR 提取文字 → 展示给用户确认 → 用户确认 → 生成 TTS → 发送
文字 → 直接生成 TTS → 发送

跳过确认流程 ⚠️

用户说不需要确认或直接生成时，跳过确认步骤。

⚠️ 安全提示：skipConfirmation 会跳过文字确认步骤，OCR 提取的文本（可能包含敏感信息）会直接转为音频并发送。适用于可信来源、低敏感内容。建议默认关闭（skipConfirmation: false）。

OCR 步骤

python

图片预处理

from PIL import Image, ImageOps
img = Image.open(image_path)
img = ImageOps.autocontrast(img.convert(L), cutoff=10)
w, h = img.size
img = img.resize((w4, h4), Image.LANCZOS)
img.save(/tmp/ocr_input.jpg, quality=99)

bash

英文

tesseract /tmp/ocr_input.jpg stdout -l eng --psm 4

中文

tesseract /tmp/ocrinput.jpg stdout -l chisim --psm 4

TTS 步骤

全文音频

bash
uvx edge-tts \
-t 全文文字 \
-v en-US-EmmaNeural \
--rate=-10% \
--write-media OUTPUTDIR/fullarticle.mp3

中文

uvx edge-tts \ -t 中文文字内容 \ -v zh-CN-XiaoxiaoNeural \ --rate=-10% \ --write-media OUTPUTDIR/fullarticle.mp3

按句拆分（仅 splitSentences=true）

python
import subprocess, re

def split_sentences(text, lang=en):
if lang == zh:
# 中文按句号/感叹号/问号拆分
sentences = re.split(r(?<=[。！？])\s*, text)
else:
# 英文按 .!? 拆分
sentences = re.split(r(?<=[.!?])\s+, text)
return [s.strip() for s in sentences if s.strip()]

sentences = split_sentences(text, lang=lang)
for i, sentence in enumerate(sentences, 1):
num = str(i).zfill(2)
voice = zh-CN-XiaoxiaoNeural if lang == zh else en-US-EmmaNeural
subprocess.run([
uvx, edge-tts,
-t, sentence,
-v, voice,
--rate=-10%,
--write-media, fOUTPUTDIR/sentence{num}.mp3
])

输出目录

/mnt/d/wslspace/workspace/articles/YYYY-MM-DD-article-slug/
├── original_text.md
├── full_article.mp3
└── sentence_01.mp3 ...

通过消息渠道发送

代理从运行时上下文检测活跃渠道，并相应调用 message(...)。不硬编码渠道——代理使用用户当前聊天所用的任何渠道。

python

自动检测活跃渠道（来自运行时入站元数据）

渠道推断：飞书 / Telegram / Discord / WhatsApp / Signal / iMessage / 企业微信

发送全文

message(action=send, channel={active_channel}, message=📄 全文音频, media=PATH/full_article.mp3, filename=full_article.mp3)

发送每句

for i, sentence in enumerate(sentences, 1): num = str(i).zfill(2) message(action=send, channel={active_channel}, message=f📝 {num}: {sentence}, media=fPATH/sentence_{num}.mp3, filename=fsentence_{num}.mp3)

渠道行为说明

渠道	音频支持	备注
飞书	✅	推荐使用 feishu-voice-send 技能发送语音消息
Telegram

✅ | 直接发送 mp3 | | Discord | ✅ | 作为附件发送 | | WhatsApp | ✅ | 直接发送 mp3 | | Signal | ⚠️ | 取决于信号强度，可能不支持 | | iMessage | ⚠️ | 通过 macOS 发送，mp3 兼容性一般 | | 企业微信 | ✅ | 同飞书 |

如果渠道不支持音频，代理将文件保存到 OUTPUT_DIR，并以文本消息形式发送文件路径。

如何发送为语音消息（而非附件）

重要说明： OpenClaw 内置的飞书媒体发送存在 bug（缺少 duration 参数），导致 .ogg 文件有时显示为附件而非语音消息。

推荐方案：使用 feishu-voice-send 技能

该技能调用飞书官方 API，正确传递 duration 参数，确保语音消息正常显示。

方式一：通过 feishu-voice-send 技能发送

bash

发送现有的 .ogg 文件

python3 /mnt/d/wslspace/workspace/skills/feishu-voice-send/scripts/send_voice.py \
/path/to/audio.ogg \
<接收者open_id>

或直接生成 TTS 并发送

python3 /mnt/d/wslspace/workspace/skills/feishu-voice-send/scripts/ttsandsend.py \ 要转换的文字 \ <接收者open_id> \ -v zh-CN-YunjianNeural \ -r -10%

方式二：手动调用（不推荐）

如果必须使用 OpenClaw 内置的 message 工具，需要：

1. 将 mp3 转换为标准 Ogg Opus 格式
发送时必须带 message 参数
注意：即使带 message 参数，仍可能因为缺少 duration 而显示为附件

bash

1. 用 edge-tts 生成 mp3

uvx edge-tts \
-t Your text here \
-v en-US-EmmaNeural \
--rate=-10% \
--write-media OUTPUT_DIR/voice.mp3

2. 用 ffmpeg 转换为标准 Ogg Opus

ffmpeg -i OUTPUT_DIR/voice.mp3 \ -c:a libopus \ -b:a 32k \ -ar 24000 \ -ac 1 \ OUTPUT_DIR/voice.ogg

3. 使用 message 工具发送（仍可能显示为附件）

message(action=send, channel=feishu, \ message=📄 语音, \ media=OUTPUT_DIR/voice.ogg)

可用的 TTS 声音

英文

en-US-EmmaNeural, en-US-BrianNeural, en-GB-LibbyNeural, ...

中文

zh-CN-XiaoxiaoNeural（女声）, zh-CN-YunxiNeural（男声）, zh-CN-YunyangNeural（新闻男声）, ...

查看完整列表：uvx edge-tts -l | grep zh-CN

article-tts文章转语音

article-tts

Article TTS Skill

Default Configuration

Supported Languages

Workflow

Input Types

Standard Flow（默认，需确认）

Skip-Confirmation Flow ⚠️

OCR Step

TTS Step

全文字频

按句拆分（仅 splitSentences=true）

Output Directory

Sending via Message Channel

Channel Behavior Notes

如何发送为语音消息（而非附件）

方式一：通过 feishu-voice-send skill 发送

方式二：手动调用（不推荐）

Available TTS Voices

English

Chinese

Notes

文章 TTS 技能

默认配置

支持的语言

工作流程

输入类型

标准流程（默认，需确认）

跳过确认流程 ⚠️

OCR 步骤

图片预处理

英文

中文

TTS 步骤

全文音频

中文

按句拆分（仅 splitSentences=true）

输出目录

通过消息渠道发送

自动检测活跃渠道（来自运行时入站元数据）

渠道推断：飞书 / Telegram / Discord / WhatsApp / Signal / iMessage / 企业微信

发送全文

发送每句

渠道行为说明

如何发送为语音消息（而非附件）

方式一：通过 feishu-voice-send 技能发送

发送现有的 .ogg 文件

或直接生成 TTS 并发送

方式二：手动调用（不推荐）

1. 用 edge-tts 生成 mp3

2. 用 ffmpeg 转换为标准 Ogg Opus

3. 使用 message 工具发送（仍可能显示为附件）

可用的 TTS 声音

英文

中文

备注

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement