简体中文 | English
Text-to-Audio
Synthesizes text into AI voice/voiceover via giggle.pro. Supports multiple voice tones, emotions, and speaking rates.
⚠️ Review Before Installing
Please review the following before installing. This skill will:
- 1. Write to
~/.openclaw/skills/giggle-generation-speech/logs/ – Task state files for Cron deduplication - Register Cron (30s interval) – Async polling when user initiates speech generation; removed when complete
- Forward raw stdout – Script output (audio links, status) is passed to the user as-is
Requirements: python3, GIGGLE_API_KEY (system environment variable), pip packages: requests
API Key: Set system environment variable
GIGGLE_API_KEY. The script will prompt if not configured.
No inline Python: All commands must be executed via the exec tool. Never use heredoc inline code.
No Retry on Error: If script execution encounters an error, do not retry. Report the error to the user directly and stop.
Execution Flow (Phase 1 Submit + Phase 2 Cron + Phase 3 Sync Fallback)
Speech generation typically takes 10–30 seconds. Uses "fast submit + Cron poll + sync fallback" three-phase architecture.
Important: Never pass GIGGLE_API_KEY in exec's env parameter. API Key is read from system environment variable.
Phase 0: Guide User to Select Voice and Emotion (required)
Before submitting, you must guide the user to select voice and emotion. Do not use defaults.
- 1. Run
--list-voices to get available voices:
CODEBLOCK0
- 2. Display the voice list to the user in a readable format (voice_id, name, style, gender, etc.) and guide them to pick one
- Ask the user's preferred emotion (e.g. joy, sad, neutral, angry, surprise). Use neutral if no preference
- Only after the user confirms voice and emotion, proceed to Phase 1 submit
Phase 1: Submit Task (exec completes in ~10 seconds)
First send a message to the user: "Speech generation in progress, usually takes 10–30 seconds. Results will be sent automatically."
CODEBLOCK1
Response example:
CODEBLOCK2
Immediately store task_id in memory (addMemory):
CODEBLOCK3
Phase 2: Register Cron (30 second interval)
Use the cron tool to register the polling job. Strictly follow the parameter format:
CODEBLOCK4
Cron trigger handling (based on exec stdout):
| stdout pattern | Action |
|---|
Non-empty plain text (not starting with {) | Forward to user as-is, remove Cron |
| stdout empty |
Already pushed,
remove Cron immediately, do not send message |
| JSON (starts with
{, has
"status" field) | Do not send message, do not remove Cron, keep waiting |
Phase 3: Sync Wait (optimistic path, fallback when Cron hasn't fired)
Execute this step whether or not Cron registration succeeded.
CODEBLOCK5
Handling logic:
- - Returns plain text (speech ready/failed message) → Forward to user as-is, remove Cron
- stdout empty → Cron already pushed, remove Cron, do not send message
- exec timeout → Cron continues polling
View Voice List
When the user wants to see available voices, run:
CODEBLOCK6
The script calls GET /api/v1/project/preset_tones and displays voice_id, name, style, gender, age, language to the user.
Link Return Rule
Audio links returned to the user must be full signed URLs (with Policy, Key-Pair-Id, Signature query params). Correct: https://assets.giggle.pro/...?Policy=...&Key-Pair-Id=...&Signature=.... Wrong: do not return unsigned URLs with only the base path (no query params). The script handles ~ encoding to %7E; keep as-is when forwarding.
New Request vs Query Old Task
When the user initiates a new speech generation request, must run Phase 1 to submit a new task. Do not reuse old task_id from memory.
Only when the user explicitly asks about a previous task's progress should you query the old task_id from memory.
Parameter Reference
| Parameter | Required | Default | Description |
|---|
| INLINECODE18 | yes | - | Text to synthesize |
| INLINECODE19 |
yes | - | Voice ID; must get via
--list-voices and guide user to choose |
|
--emotion | yes | - | Emotion: joy, sad, neutral, angry, surprise, etc. Guide user to choose |
|
--speed | no | 1 | Speaking rate multiplier |
|
--list-voices | - | - | Get available voice list |
|
--query | - | - | Query task status |
|
--task-id | required for query | - | Task ID |
|
--poll | no | - | Sync poll with
--query |
|
--max-wait | no | 120 | Max wait seconds |
Interaction Guide
Before each speech generation, complete this interaction:
- 1. If the user did not provide text, ask: "Which text would you like to convert to speech?"
- Must guide user to select voice: Run
--list-voices, display list, have user choose. Do not use default voice - Must guide user to select emotion: Ask the user's preferred emotion (joy, sad, neutral, angry, surprise, etc.)
- After user confirms text, voice, and emotion, run Phase 1 submit → Phase 2 register Cron → Phase 3 sync wait
技能名称: giggle-generation-speech
详细描述:
简体中文 | English
文本转语音
通过 giggle.pro 将文本合成为 AI 语音/配音。支持多种音色、情感和语速。
⚠️ 安装前须知
请在安装前仔细阅读以下内容。 此技能将:
- 1. 写入 ~/.openclaw/skills/giggle-generation-speech/logs/ – 用于 Cron 去重的任务状态文件
- 注册 Cron(30 秒间隔)– 用户发起语音生成时的异步轮询;完成后移除
- 转发原始标准输出 – 脚本输出(音频链接、状态)将原样传递给用户
要求:python3、GIGGLEAPIKEY(系统环境变量)、pip 包:requests
API 密钥:设置系统环境变量 GIGGLE
APIKEY。如果未配置,脚本会提示。
禁止内联 Python:所有命令必须通过 exec 工具执行。切勿使用 heredoc 内联代码。
出错不重试:如果脚本执行遇到错误,不要重试。直接向用户报告错误并停止。
执行流程(阶段 1 提交 + 阶段 2 Cron + 阶段 3 同步回退)
语音生成通常需要 10–30 秒。采用“快速提交 + Cron 轮询 + 同步回退”三阶段架构。
重要:切勿在 exec 的 env 参数中传递 GIGGLEAPIKEY。API 密钥从系统环境变量读取。
阶段 0:引导用户选择音色和情感(必需)
在提交之前,你必须引导用户选择音色和情感。不要使用默认值。
- 1. 运行 --list-voices 获取可用音色:
bash
python3 scripts/texttoaudio_api.py --list-voices
- 2. 以可读格式向用户展示音色列表(voice_id、名称、风格、性别等),并引导用户选择一个
- 询问用户偏好的情感(例如 joy、sad、neutral、angry、surprise)。如果没有偏好则使用 neutral
- 只有在用户确认音色和情感后,才能进入阶段 1 提交
阶段 1:提交任务(exec 约 10 秒完成)
首先向用户发送消息:“语音生成进行中,通常需要 10–30 秒。结果将自动发送。”
bash
必须指定用户选择的音色和情感
python3 scripts/text
toaudio_api.py \
--text 今天天气真好 \
--voice-id Calm_Woman \
--emotion joy \
--speed 1.2 \
--no-wait --json
查看可用音色
python3 scripts/text
toaudio_api.py --list-voices
响应示例:
json
{status: started, task_id: xxx}
立即将 task_id 存入内存(addMemory):
giggle-generation-speech task_id: xxx(提交时间:YYYY-MM-DD HH:mm)
阶段 2:注册 Cron(30 秒间隔)
使用 cron 工具注册轮询任务。严格遵循参数格式:
json
{
action: add,
job: {
name: giggle-generation-speech-,
schedule: {
kind: every,
everyMs: 30000
},
payload: {
kind: systemEvent,
text: 语音任务轮询:执行 python3 scripts/texttoaudioapi.py --query --task-id <完整 taskid>,根据 Cron 逻辑处理标准输出。如果标准输出是非 JSON 的纯文本,转发给用户并移除 Cron。如果标准输出是 JSON,不发送消息,继续等待。如果标准输出为空,立即移除 Cron。
},
sessionTarget: main
}
}
Cron 触发处理(基于 exec 的标准输出):
| 标准输出模式 | 操作 |
|---|
| 非空纯文本(不以 { 开头) | 原样转发给用户,移除 Cron |
| 标准输出为空 |
已推送,
立即移除 Cron,不发送消息 |
| JSON(以 { 开头,包含 status 字段) | 不发送消息,不移除 Cron,继续等待 |
阶段 3:同步等待(乐观路径,Cron 未触发时的回退)
无论 Cron 注册是否成功,都要执行此步骤。
bash
python3 scripts/texttoaudioapi.py --query --task-id id> --poll --max-wait 120
处理逻辑:
- - 返回纯文本(语音就绪/失败消息)→ 原样转发给用户,移除 Cron
- 标准输出为空 → Cron 已推送,移除 Cron,不发送消息
- exec 超时 → Cron 继续轮询
查看音色列表
当用户想查看可用音色时,运行:
bash
python3 scripts/texttoaudio_api.py --list-voices
脚本调用 GET /api/v1/project/presettones,并向用户展示 voiceid、名称、风格、性别、年龄、语言。
链接返回规则
返回给用户的音频链接必须是完整的签名 URL(包含 Policy、Key-Pair-Id、Signature 查询参数)。正确示例:https://assets.giggle.pro/...?Policy=...&Key-Pair-Id=...&Signature=...。错误示例:不要返回仅包含基础路径(无查询参数)的未签名 URL。脚本会处理 ~ 编码为 %7E;转发时保持原样。
新请求 vs 查询旧任务
当用户发起新的语音生成请求时,必须运行阶段 1 提交新任务。不要重复使用内存中的旧 task_id。
只有当用户明确询问之前任务的进度时,才从内存中查询旧的 task_id。
参数参考
| 参数 | 必需 | 默认值 | 描述 |
|---|
| --text | 是 | - | 要合成的文本 |
| --voice-id |
是 | - | 音色 ID;必须通过 --list-voices 获取并引导用户选择 |
| --emotion | 是 | - | 情感:joy、sad、neutral、angry、surprise 等。引导用户选择 |
| --speed | 否 | 1 | 语速倍数 |
| --list-voices | - | - | 获取可用音色列表 |
| --query | - | - | 查询任务状态 |
| --task-id | 查询时必需 | - | 任务 ID |
| --poll | 否 | - | 与 --query 配合的同步轮询 |
| --max-wait | 否 | 120 | 最大等待秒数 |
交互指南
每次语音生成前,完成以下交互:
- 1. 如果用户未提供文本,询问:“您想将哪段文本转换为语音?”
- 必须引导用户选择音色:运行 --list-voices,展示列表,让用户选择。不要使用默认音色
- 必须引导用户选择情感:询问用户偏好的情感(joy、sad、neutral、angry、surprise 等)
- 用户确认文本、音色和情感后,运行阶段 1 提交 → 阶段 2 注册 Cron → 阶段 3 同步等待