Text-to-Audio

Synthesizes text into AI voice/voiceover via giggle.pro. Supports multiple voice tones, emotions, and speaking rates.

⚠️ Review Before Installing

Please review the following before installing. This skill will:

1. Write to ~/.openclaw/skills/giggle-generation-speech/logs/ – Task state files for Cron deduplication
Register Cron (30s interval) – Async polling when user initiates speech generation; removed when complete
Forward raw stdout – Script output (audio links, status) is passed to the user as-is

Requirements: python3, GIGGLE_API_KEY (system environment variable), pip packages: requests

API Key: Set system environment variable GIGGLE_API_KEY. The script will prompt if not configured.

No inline Python: All commands must be executed via the exec tool. Never use heredoc inline code.

No Retry on Error: If script execution encounters an error, do not retry. Report the error to the user directly and stop.

Execution Flow (Phase 1 Submit + Phase 2 Cron + Phase 3 Sync Fallback)

Speech generation typically takes 10–30 seconds. Uses "fast submit + Cron poll + sync fallback" three-phase architecture.

Important: Never pass GIGGLE_API_KEY in exec's env parameter. API Key is read from system environment variable.

Phase 0: Guide User to Select Voice and Emotion (required)

Before submitting, you must guide the user to select voice and emotion. Do not use defaults.

1. Run --list-voices to get available voices:

CODEBLOCK0

2. Display the voice list to the user in a readable format (voice_id, name, style, gender, etc.) and guide them to pick one
Ask the user's preferred emotion (e.g. joy, sad, neutral, angry, surprise). Use neutral if no preference
Only after the user confirms voice and emotion, proceed to Phase 1 submit

Phase 1: Submit Task (exec completes in ~10 seconds)

First send a message to the user: "Speech generation in progress, usually takes 10–30 seconds. Results will be sent automatically."

CODEBLOCK1

Response example:

CODEBLOCK2

Immediately store task_id in memory (addMemory):

CODEBLOCK3

Phase 2: Register Cron (30 second interval)

Use the cron tool to register the polling job. Strictly follow the parameter format:

CODEBLOCK4

Cron trigger handling (based on exec stdout):

stdout pattern	Action
Non-empty plain text (not starting with `{`)	Forward to user as-is, remove Cron
stdout empty

Already pushed, remove Cron immediately, do not send message |
| JSON (starts with {, has "status" field) | Do not send message, do not remove Cron, keep waiting |

Phase 3: Sync Wait (optimistic path, fallback when Cron hasn't fired)

Execute this step whether or not Cron registration succeeded.

CODEBLOCK5

Handling logic:

- Returns plain text (speech ready/failed message) → Forward to user as-is, remove Cron
stdout empty → Cron already pushed, remove Cron, do not send message
exec timeout → Cron continues polling

View Voice List

When the user wants to see available voices, run:

CODEBLOCK6

The script calls GET /api/v1/project/preset_tones and displays voice_id, name, style, gender, age, language to the user.

Link Return Rule

Audio links returned to the user must be full signed URLs (with Policy, Key-Pair-Id, Signature query params). Correct: https://assets.giggle.pro/...?Policy=...&Key-Pair-Id=...&Signature=.... Wrong: do not return unsigned URLs with only the base path (no query params). The script handles ~ encoding to %7E; keep as-is when forwarding.

New Request vs Query Old Task

When the user initiates a new speech generation request, must run Phase 1 to submit a new task. Do not reuse old task_id from memory.

Only when the user explicitly asks about a previous task's progress should you query the old task_id from memory.

Parameter Reference

Parameter	Required	Default	Description
INLINECODE18	yes	-	Text to synthesize
INLINECODE19

yes | - | Voice ID; must get via --list-voices and guide user to choose |
| --emotion | yes | - | Emotion: joy, sad, neutral, angry, surprise, etc. Guide user to choose |
| --speed | no | 1 | Speaking rate multiplier |
| --list-voices | - | - | Get available voice list |
| --query | - | - | Query task status |
| --task-id | required for query | - | Task ID |
| --poll | no | - | Sync poll with --query |
| --max-wait | no | 120 | Max wait seconds |

Interaction Guide

Before each speech generation, complete this interaction:

1. If the user did not provide text, ask: "Which text would you like to convert to speech?"
Must guide user to select voice: Run --list-voices, display list, have user choose. Do not use default voice
Must guide user to select emotion: Ask the user's preferred emotion (joy, sad, neutral, angry, surprise, etc.)
After user confirms text, voice, and emotion, run Phase 1 submit → Phase 2 register Cron → Phase 3 sync wait

技能名称: giggle-generation-speech

详细描述:
简体中文 | English

文本转语音

通过 giggle.pro 将文本合成为 AI 语音/配音。支持多种音色、情感和语速。

⚠️ 安装前须知

请在安装前仔细阅读以下内容。 此技能将：

1. 写入 ~/.openclaw/skills/giggle-generation-speech/logs/ – 用于 Cron 去重的任务状态文件
注册 Cron（30 秒间隔）– 用户发起语音生成时的异步轮询；完成后移除
转发原始标准输出 – 脚本输出（音频链接、状态）将原样传递给用户

要求：python3、GIGGLEAPIKEY（系统环境变量）、pip 包：requests

API 密钥：设置系统环境变量 GIGGLEAPIKEY。如果未配置，脚本会提示。

禁止内联 Python：所有命令必须通过 exec 工具执行。切勿使用 heredoc 内联代码。

出错不重试：如果脚本执行遇到错误，不要重试。直接向用户报告错误并停止。

执行流程（阶段 1 提交 + 阶段 2 Cron + 阶段 3 同步回退）

语音生成通常需要 10–30 秒。采用“快速提交 + Cron 轮询 + 同步回退”三阶段架构。

重要：切勿在 exec 的 env 参数中传递 GIGGLEAPIKEY。API 密钥从系统环境变量读取。

阶段 0：引导用户选择音色和情感（必需）

在提交之前，你必须引导用户选择音色和情感。不要使用默认值。

1. 运行 --list-voices 获取可用音色：

bash
python3 scripts/texttoaudio_api.py --list-voices

2. 以可读格式向用户展示音色列表（voice_id、名称、风格、性别等），并引导用户选择一个
询问用户偏好的情感（例如 joy、sad、neutral、angry、surprise）。如果没有偏好则使用 neutral
只有在用户确认音色和情感后，才能进入阶段 1 提交

阶段 1：提交任务（exec 约 10 秒完成）

首先向用户发送消息：“语音生成进行中，通常需要 10–30 秒。结果将自动发送。”

bash

必须指定用户选择的音色和情感

python3 scripts/texttoaudio_api.py \
--text 今天天气真好 \
--voice-id Calm_Woman \
--emotion joy \
--speed 1.2 \
--no-wait --json

查看可用音色

python3 scripts/texttoaudio_api.py --list-voices

响应示例：

json
{status: started, task_id: xxx}

立即将 task_id 存入内存（addMemory）：

giggle-generation-speech task_id: xxx（提交时间：YYYY-MM-DD HH:mm）

阶段 2：注册 Cron（30 秒间隔）

使用 cron 工具注册轮询任务。严格遵循参数格式：

json
{
action: add,
job: {
name: giggle-generation-speech-,
schedule: {
kind: every,
everyMs: 30000
},
payload: {
kind: systemEvent,
text: 语音任务轮询：执行 python3 scripts/texttoaudioapi.py --query --task-id <完整 taskid>，根据 Cron 逻辑处理标准输出。如果标准输出是非 JSON 的纯文本，转发给用户并移除 Cron。如果标准输出是 JSON，不发送消息，继续等待。如果标准输出为空，立即移除 Cron。
},
sessionTarget: main
}
}

Cron 触发处理（基于 exec 的标准输出）：

标准输出模式	操作
非空纯文本（不以 { 开头）	原样转发给用户，移除 Cron
标准输出为空

已推送，立即移除 Cron，不发送消息 |
| JSON（以 { 开头，包含 status 字段） | 不发送消息，不移除 Cron，继续等待 |

阶段 3：同步等待（乐观路径，Cron 未触发时的回退）

无论 Cron 注册是否成功，都要执行此步骤。

bash
python3 scripts/texttoaudioapi.py --query --task-id id> --poll --max-wait 120

处理逻辑：

- 返回纯文本（语音就绪/失败消息）→ 原样转发给用户，移除 Cron
标准输出为空 → Cron 已推送，移除 Cron，不发送消息
exec 超时 → Cron 继续轮询

查看音色列表

当用户想查看可用音色时，运行：

bash
python3 scripts/texttoaudio_api.py --list-voices

脚本调用 GET /api/v1/project/presettones，并向用户展示 voiceid、名称、风格、性别、年龄、语言。

链接返回规则

返回给用户的音频链接必须是完整的签名 URL（包含 Policy、Key-Pair-Id、Signature 查询参数）。正确示例：https://assets.giggle.pro/...?Policy=...&Key-Pair-Id=...&Signature=...。错误示例：不要返回仅包含基础路径（无查询参数）的未签名 URL。脚本会处理 ~ 编码为 %7E；转发时保持原样。

新请求 vs 查询旧任务

当用户发起新的语音生成请求时，必须运行阶段 1 提交新任务。不要重复使用内存中的旧 task_id。

只有当用户明确询问之前任务的进度时，才从内存中查询旧的 task_id。

参数参考

参数必需默认值描述
--text 是 - 要合成的文本
--voice-id
是 | - | 音色 ID；必须通过 --list-voices 获取并引导用户选择 |
| --emotion | 是 | - | 情感：joy、sad、neutral、angry、surprise 等。引导用户选择 |
| --speed | 否 | 1 | 语速倍数 |
| --list-voices | - | - | 获取可用音色列表 |
| --query | - | - | 查询任务状态 |
| --task-id | 查询时必需 | - | 任务 ID |
| --poll | 否 | - | 与 --query 配合的同步轮询 |
| --max-wait | 否 | 120 | 最大等待秒数 |

交互指南

每次语音生成前，完成以下交互：

1. 如果用户未提供文本，询问：“您想将哪段文本转换为语音？”
必须引导用户选择音色：运行 --list-voices，展示列表，让用户选择。不要使用默认音色
必须引导用户选择情感：询问用户偏好的情感（joy、sad、neutral、angry、surprise 等）
用户确认文本、音色和情感后，运行阶段 1 提交 → 阶段 2 注册 Cron → 阶段 3 同步等待

giggle-generation-speech咯咯语音生成

giggle-generation-speech