RunPod Media Skill

Generate AI images and videos using RunPod public endpoints. All output is saved to ~/runpod-media/.

API Keys

One key required — add to ~/.openclaw/secrets.json:

Key path	Purpose	Get it from
INLINECODE2	Call RunPod endpoints	runpod.io/console/user/settings

Local images are uploaded to Cloudflare R2 as presigned URLs (1 min expiry) before being sent to RunPod endpoints. R2 credentials are read from /cloudflare/r2 in secrets.json — already configured ✅

imgbb is no longer used. R2 presigned URLs replace it for all local file uploads.

R2 cleanup: Objects in uploads/ are auto-deleted after 1 day via a lifecycle rule on the openclaw bucket. Presigned URLs expire after 1 min (no access), objects are cleaned up within 24h.

Keys are resolved in this order:

1. OpenClaw secrets.json — ~/.openclaw/secrets.json ✅ (already configured)
Env vars — INLINECODE7

How Users Ask (Natural Language Examples)

The user will never type CLI commands — translate their natural requests into the right script call.

Generate an image:

- "Generate an image of a samurai cat in neon Tokyo" → INLINECODE8
"Make me a 16:9 image of a stormy ocean at sunset" → INLINECODE9
"Create an image using Nano Banana — a futuristic city" → INLINECODE10

Edit an image:

- "Edit this image — add snow falling" → INLINECODE11
"Use Qwen to edit this photo, make it look like a painting" → INLINECODE12

Animate to video:

- "Animate this image — slow camera pan" → INLINECODE13
"Make a video from this with Kling" → INLINECODE14
"Turn this into a 10 second clip with Sora 2" → INLINECODE15

Text to video:

- "Generate a video of a wolf howling at the moon" → INLINECODE16

List available models:

- "What image/video models do you have?"
"List the available endpoints"
"Show me what RunPod models are available"

→ Run list_endpoints and summarize the output in plain language for the user

Add a new endpoint:

- "Add this RunPod endpoint: https://console.runpod.io/hub/playground/voice/kokoro-tts"
"Probe and add these endpoints: kokoro-tts, flux-kontext-pro"

→ Run discover_endpoints add --candidates "<url-or-id>"

Capabilities & Cost

Task	Command	Cost	Time
Text → Image	INLINECODE19	~$0.005/image	3–8s
Edit image(s)

The built-in commands use default endpoints. For more models (Nano Banana Pro, FLUX, Sora 2, Kling, TTS, etc.) use call_endpoint with any RunPod public endpoint ID.

Endpoint Registry

All known public endpoints are in scripts/endpoints.json. List them:

CODEBLOCK0

Call Any Endpoint

CODEBLOCK1

Examples:

CODEBLOCK2

Adding New Endpoints

When the user asks to use an endpoint not in the registry, or the runpod skill reveals a new one:

1. Call it directly with --endpoint <id> — no registry entry needed
Optionally add it to scripts/endpoints.json for future sessions

With runpod skill: Use the runpod skill to browse/discover endpoint IDs on the RunPod hub, then pass that ID to call_endpoint here.

Generate Image

CODEBLOCK3

Edit Image

CODEBLOCK4

- Accepts 1–5 images (local paths or URLs)
Local files are auto-uploaded via imgbb (requires /imgbb/apiKey in secrets.json)

Animate Image → Video

CODEBLOCK5

Models:

- wan25 (default) — WAN 2.5, ~$0.026/5s
INLINECODE34 — Kling v2.1 Pro, $0.45/5s (highest quality)
INLINECODE35 — Seedance 1.0 Pro, ~$0.12/5s

Generate Video from Text

CODEBLOCK6

Models:

- wan26 (default) — WAN 2.6, ~$0.04/5s
INLINECODE37 — Seedance 1.0 Pro, ~$0.12/5s

Defaults

- Delete after send — always delete the local file after successful delivery. Only keep if the user explicitly asks ("keep it", "save it", "--keep").
Captions — keep them short and natural. Do NOT include render time or cost unless the user asks. Example: 🦊 Fox under the aurora not 🦊 Fox — 105s render (~$0.026).

Delivering Media to the User

After generating an image or video, always deliver it to the user via their active channel.

The Problem

The message tool with a local media path may fail in sandboxed agent modes due to SecretRef resolution not being available for media sends. This is a known OpenClaw limitation.

The Solution: Use curl + Telegram Bot API directly

Read the bot token from secrets and send via curl — this always works regardless of sandbox mode:

CODEBLOCK7

Where to find CHAT_ID: Use the chat_id from the inbound message metadata (e.g. telegram:1231438498 → use 1231438498).

Delivery Checklist

1. ✅ Generate the file (note the saved path from script output)
✅ Send via message tool with a short, natural caption (no cost/time unless asked)
✅ Delete the local file after successful send: INLINECODE46
⏭️ Skip deletion only if user said "keep it" / "save it" / passed INLINECODE47

Why not `message` tool?

Try it first — if it works, great. If it returns a SecretRef error, fall back to the curl method above.

Notes

- Video generation takes 30–120 seconds — inform the user to wait
Output files saved to ~/.openclaw/workspace/runpod-media/ — accessible in both sandboxed and elevated agent modes
Shared utilities in scripts/_utils.py — do not call directly

RunPod 媒体技能

使用 RunPod 公共端点生成 AI 图像和视频。所有输出均保存至 ~/runpod-media/。

API 密钥

需要一个密钥——添加至 ~/.openclaw/secrets.json：

密钥路径	用途	获取地址
/runpod/apiKey	调用 RunPod 端点	runpod.io/console/user/settings

本地图像在上传到 RunPod 端点之前，会先以预签名 URL（有效期 1 分钟）的形式上传至 Cloudflare R2。R2 凭证从 secrets.json 中的 /cloudflare/r2 读取——已配置 ✅

imgbb 已不再使用。所有本地文件上传均由 R2 预签名 URL 替代。

R2 清理： uploads/ 中的对象会在 1 天 后通过 openclaw 存储桶的生命周期规则自动删除。预签名 URL 在 1 分钟后过期（无法访问），对象会在 24 小时内清理完毕。

密钥按以下顺序解析：

1. OpenClaw secrets.json — ~/.openclaw/secrets.json ✅（已配置）
环境变量 — RUNPODAPIKEY

用户提问方式（自然语言示例）

用户永远不会输入 CLI 命令——将其自然语言请求转换为正确的脚本调用。

生成图像：

- 生成一张霓虹东京武士猫的图像 → generateimage --prompt ...
给我做一张 16:9 的日落暴风雨海洋图像 → generateimage --prompt ... --aspect-ratio 16:9
使用 Nano Banana 创建一张图像——未来城市 → call_endpoint --endpoint google-nano-banana-2-edit --prompt ...

编辑图像：

- 编辑这张图像——添加下雪效果 → editimage --images <文件> --prompt 添加下雪效果
使用 Qwen 编辑这张照片，让它看起来像一幅画 → callendpoint --endpoint qwen-image-edit --image <文件> --prompt 让它看起来像一幅画

动画转视频：

- 让这张图像动起来——慢速平移镜头 → imagetovideo --image <文件> --prompt 慢速平移镜头
用 Kling 把这个做成视频 → imagetovideo --image <文件> --model kling --prompt ...
用 Sora 2 把这个变成 10 秒的片段 → call_endpoint --endpoint sora-2-pro-i2v --image <文件> --prompt ... --duration 10

文本转视频：

- 生成一段狼对月嚎叫的视频 → texttovideo --prompt ...

列出可用模型：

- 你有什么图像/视频模型？
列出可用的端点
显示可用的 RunPod 模型

→ 运行 list_endpoints 并用通俗语言向用户总结输出结果

添加新端点：

- 添加这个 RunPod 端点：https://console.runpod.io/hub/playground/voice/kokoro-tts
探测并添加这些端点：kokoro-tts, flux-kontext-pro

→ 运行 discover_endpoints add --candidates

能力与成本

任务	命令	成本	时间
文本 → 图像	generateimage	~$0.005/张	3–8秒
编辑图像

内置命令使用默认端点。如需 更多模型（Nano Banana Pro、FLUX、Sora 2、Kling、TTS 等），请使用 call_endpoint 配合任意 RunPod 公共端点 ID。

端点注册表

所有已知的公共端点位于 scripts/endpoints.json。列出它们：

bash
$SKILLDIR/run.sh listendpoints

调用任意端点

bash
$SKILLDIR/run.sh callendpoint \
--endpoint \
[--prompt 文本] \
[--image 路径或URL] \
[--audio 路径或URL] \
[--duration 5] \
[--aspect-ratio 16:9] \
[--input {key: value}] # 完整 JSON 覆盖

示例：

bash

Nano Banana Pro 图像生成

$SKILLDIR/run.sh callendpoint --endpoint nano-banana-pro --prompt 太空中的金毛犬

Nano Banana Pro 图像编辑

$SKILLDIR/run.sh callendpoint --endpoint nano-banana-pro --prompt 变成夜晚 --image photo.jpg

Sora 2 Pro 图像转视频

$SKILLDIR/run.sh callendpoint --endpoint sora-2-pro-i2v --image photo.jpg --prompt 镜头缓慢拉远 --duration 5

Kokoro TTS

$SKILLDIR/run.sh callendpoint --endpoint kokoro-tts --text 你好世界

FLUX Schnell

$SKILLDIR/run.sh callendpoint --endpoint flux-schnell --prompt 赛博朋克城市 --input {width:1024,height:1024}

添加新端点

当用户要求使用注册表中不存在的端点，或 runpod 技能揭示了一个新端点时：

1. 直接使用 --endpoint 调用——无需注册表条目
可选择将其添加到 scripts/endpoints.json 以供后续会话使用

配合 runpod 技能： 使用 runpod 技能浏览/发现 RunPod 中心上的端点 ID，然后将该 ID 传递给此处的 call_endpoint。

生成图像

bash
$SKILLDIR/run.sh generateimage \
--prompt 提示词 \
[--aspect-ratio 1:1|16:9|9:16|4:3|3:4] \
[--seed 42]

编辑图像

bash
$SKILLDIR/run.sh editimage \
--images 路径或URL [路径或URL ...] \
--prompt 编辑指令 \
[--aspect-ratio 1:1] \
[--seed 42]

- 接受 1–5 张图像（本地路径或 URL）
本地文件通过 imgbb 自动上传（需要 secrets.json 中的 /imgbb/apiKey）

图像动画转视频

bash
$SKILLDIR/run.sh imageto_video \
--image 路径或URL \
--prompt 运动描述 \
[--model wan25|kling|seedance] \
[--duration 5|10] \
[--negative-prompt 文本]

模型：

- wan25（默认）— WAN 2.5，~$0.026/5秒
kling — Kling v2.1 Pro，$0.45/5秒（最高质量）
seedance — Seedance 1.0 Pro，~$0.12/5秒

文本生成视频

bash
$SKILLDIR/run.sh textto_video \
--prompt 视频描述 \
[--model wan26|seedance] \
[--duration 5|10|15] \
[--size 1920x1080] \
[--negative-prompt 文本]

模型：

- wan26（默认）— WAN 2.6，~$0.04/5秒
seedance — Seedance 1.0 Pro，~$0.12/5秒

默认设置

- 发送后删除 — 成功发送后始终删除本地文件。仅在用户明确要求时保留（保留它、保存它、--keep）。
标题 — 保持简短自然。除非用户询问，否则不要包含渲染时间或成本。示例：🦊 极光下的狐狸而不是 🦊 狐狸 — 105秒渲染（~$0.026）。

向用户传递媒体

生成图像或视频后，务必通过用户的活跃频道将其传递给用户。

###

runpod-mediaRunPod媒体生成

runpod-media

RunPod Media Skill

API Keys

How Users Ask (Natural Language Examples)

Capabilities & Cost

Endpoint Registry

Call Any Endpoint

Adding New Endpoints

Generate Image

Edit Image

Animate Image → Video

Generate Video from Text

Defaults

Delivering Media to the User

The Problem

The Solution: Use curl + Telegram Bot API directly

Delivery Checklist

Why not message tool?

Notes

RunPod 媒体技能

API 密钥

用户提问方式（自然语言示例）

能力与成本

端点注册表

调用任意端点

Nano Banana Pro 图像生成

Nano Banana Pro 图像编辑

Sora 2 Pro 图像转视频

Kokoro TTS

FLUX Schnell

添加新端点

生成图像

编辑图像

图像动画转视频

文本生成视频

默认设置

向用户传递媒体

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement

Why not `message` tool?