RunPod Media Skill
Generate AI images and videos using RunPod public endpoints. All output is saved to ~/runpod-media/.
API Keys
One key required — add to ~/.openclaw/secrets.json:
Local images are uploaded to Cloudflare R2 as presigned URLs (1 min expiry) before being sent to RunPod endpoints. R2 credentials are read from /cloudflare/r2 in secrets.json — already configured ✅
imgbb is no longer used. R2 presigned URLs replace it for all local file uploads.
R2 cleanup: Objects in uploads/ are auto-deleted after 1 day via a lifecycle rule on the openclaw bucket. Presigned URLs expire after 1 min (no access), objects are cleaned up within 24h.
Keys are resolved in this order:
- 1. OpenClaw secrets.json —
~/.openclaw/secrets.json ✅ (already configured) - Env vars — INLINECODE7
How Users Ask (Natural Language Examples)
The user will never type CLI commands — translate their natural requests into the right script call.
Generate an image:
- - "Generate an image of a samurai cat in neon Tokyo" → INLINECODE8
- "Make me a 16:9 image of a stormy ocean at sunset" → INLINECODE9
- "Create an image using Nano Banana — a futuristic city" → INLINECODE10
Edit an image:
- - "Edit this image — add snow falling" → INLINECODE11
- "Use Qwen to edit this photo, make it look like a painting" → INLINECODE12
Animate to video:
- - "Animate this image — slow camera pan" → INLINECODE13
- "Make a video from this with Kling" → INLINECODE14
- "Turn this into a 10 second clip with Sora 2" → INLINECODE15
Text to video:
- - "Generate a video of a wolf howling at the moon" → INLINECODE16
List available models:
- - "What image/video models do you have?"
- "List the available endpoints"
- "Show me what RunPod models are available"
→ Run
list_endpoints and summarize the output in plain language for the user
Add a new endpoint:
- - "Add this RunPod endpoint: https://console.runpod.io/hub/playground/voice/kokoro-tts"
- "Probe and add these endpoints: kokoro-tts, flux-kontext-pro"
→ Run
discover_endpoints add --candidates "<url-or-id>"
Capabilities & Cost
| Task | Command | Cost | Time |
|---|
| Text → Image | INLINECODE19 | ~$0.005/image | 3–8s |
| Edit image(s) |
edit_image | ~$0.005/image | 5–15s |
| Image → Video |
image_to_video | $0.03–$0.90/clip | 30–120s |
| Text → Video |
text_to_video | $0.04–$1.22/clip | 30–120s |
|
Any endpoint |
call_endpoint | varies | varies |
The built-in commands use default endpoints. For more models (Nano Banana Pro, FLUX, Sora 2, Kling, TTS, etc.) use call_endpoint with any RunPod public endpoint ID.
Endpoint Registry
All known public endpoints are in scripts/endpoints.json. List them:
CODEBLOCK0
Call Any Endpoint
CODEBLOCK1
Examples:
CODEBLOCK2
Adding New Endpoints
When the user asks to use an endpoint not in the registry, or the runpod skill reveals a new one:
- 1. Call it directly with
--endpoint <id> — no registry entry needed - Optionally add it to
scripts/endpoints.json for future sessions
With runpod skill: Use the runpod skill to browse/discover endpoint IDs on the RunPod hub, then pass that ID to call_endpoint here.
Generate Image
CODEBLOCK3
Edit Image
CODEBLOCK4
- - Accepts 1–5 images (local paths or URLs)
- Local files are auto-uploaded via imgbb (requires
/imgbb/apiKey in secrets.json)
Animate Image → Video
CODEBLOCK5
Models:
- -
wan25 (default) — WAN 2.5, ~$0.026/5s - INLINECODE34 — Kling v2.1 Pro, $0.45/5s (highest quality)
- INLINECODE35 — Seedance 1.0 Pro, ~$0.12/5s
Generate Video from Text
CODEBLOCK6
Models:
- -
wan26 (default) — WAN 2.6, ~$0.04/5s - INLINECODE37 — Seedance 1.0 Pro, ~$0.12/5s
Defaults
- - Delete after send — always delete the local file after successful delivery. Only keep if the user explicitly asks ("keep it", "save it", "--keep").
- Captions — keep them short and natural. Do NOT include render time or cost unless the user asks. Example:
🦊 Fox under the aurora not 🦊 Fox — 105s render (~$0.026).
Delivering Media to the User
After generating an image or video, always deliver it to the user via their active channel.
The Problem
The
message tool with a local
media path may fail in sandboxed agent modes due to SecretRef resolution not being available for media sends. This is a known OpenClaw limitation.
The Solution: Use curl + Telegram Bot API directly
Read the bot token from secrets and send via curl — this always works regardless of sandbox mode:
CODEBLOCK7
Where to find CHAT_ID: Use the chat_id from the inbound message metadata (e.g. telegram:1231438498 → use 1231438498).
Delivery Checklist
- 1. ✅ Generate the file (note the saved path from script output)
- ✅ Send via
message tool with a short, natural caption (no cost/time unless asked) - ✅ Delete the local file after successful send: INLINECODE46
- ⏭️ Skip deletion only if user said "keep it" / "save it" / passed INLINECODE47
Why not message tool?
Try it first — if it works, great. If it returns a SecretRef error, fall back to the curl method above.
Notes
- - Video generation takes 30–120 seconds — inform the user to wait
- Output files saved to
~/.openclaw/workspace/runpod-media/ — accessible in both sandboxed and elevated agent modes - Shared utilities in
scripts/_utils.py — do not call directly
RunPod 媒体技能
使用 RunPod 公共端点生成 AI 图像和视频。所有输出均保存至 ~/runpod-media/。
API 密钥
需要一个密钥——添加至 ~/.openclaw/secrets.json:
本地图像在上传到 RunPod 端点之前,会先以预签名 URL(有效期 1 分钟)的形式上传至 Cloudflare R2。R2 凭证从 secrets.json 中的 /cloudflare/r2 读取——已配置 ✅
imgbb 已不再使用。所有本地文件上传均由 R2 预签名 URL 替代。
R2 清理: uploads/ 中的对象会在 1 天 后通过 openclaw 存储桶的生命周期规则自动删除。预签名 URL 在 1 分钟后过期(无法访问),对象会在 24 小时内清理完毕。
密钥按以下顺序解析:
- 1. OpenClaw secrets.json — ~/.openclaw/secrets.json ✅(已配置)
- 环境变量 — RUNPODAPIKEY
用户提问方式(自然语言示例)
用户永远不会输入 CLI 命令——将其自然语言请求转换为正确的脚本调用。
生成图像:
- - 生成一张霓虹东京武士猫的图像 → generateimage --prompt ...
- 给我做一张 16:9 的日落暴风雨海洋图像 → generateimage --prompt ... --aspect-ratio 16:9
- 使用 Nano Banana 创建一张图像——未来城市 → call_endpoint --endpoint google-nano-banana-2-edit --prompt ...
编辑图像:
- - 编辑这张图像——添加下雪效果 → editimage --images <文件> --prompt 添加下雪效果
- 使用 Qwen 编辑这张照片,让它看起来像一幅画 → callendpoint --endpoint qwen-image-edit --image <文件> --prompt 让它看起来像一幅画
动画转视频:
- - 让这张图像动起来——慢速平移镜头 → imagetovideo --image <文件> --prompt 慢速平移镜头
- 用 Kling 把这个做成视频 → imagetovideo --image <文件> --model kling --prompt ...
- 用 Sora 2 把这个变成 10 秒的片段 → call_endpoint --endpoint sora-2-pro-i2v --image <文件> --prompt ... --duration 10
文本转视频:
- - 生成一段狼对月嚎叫的视频 → texttovideo --prompt ...
列出可用模型:
- - 你有什么图像/视频模型?
- 列出可用的端点
- 显示可用的 RunPod 模型
→ 运行 list_endpoints 并用通俗语言向用户总结输出结果
添加新端点:
- - 添加这个 RunPod 端点:https://console.runpod.io/hub/playground/voice/kokoro-tts
- 探测并添加这些端点:kokoro-tts, flux-kontext-pro
→ 运行 discover_endpoints add --candidates
能力与成本
| 任务 | 命令 | 成本 | 时间 |
|---|
| 文本 → 图像 | generateimage | ~$0.005/张 | 3–8秒 |
| 编辑图像 |
editimage | ~$0.005/张 | 5–15秒 |
| 图像 → 视频 | imagetovideo | $0.03–$0.90/片段 | 30–120秒 |
| 文本 → 视频 | texttovideo | $0.04–$1.22/片段 | 30–120秒 |
| 任意端点 | call_endpoint | 视情况而定 | 视情况而定 |
内置命令使用默认端点。如需 更多模型(Nano Banana Pro、FLUX、Sora 2、Kling、TTS 等),请使用 call_endpoint 配合任意 RunPod 公共端点 ID。
端点注册表
所有已知的公共端点位于 scripts/endpoints.json。列出它们:
bash
$SKILLDIR/run.sh listendpoints
调用任意端点
bash
$SKILLDIR/run.sh callendpoint \
--endpoint \
[--prompt 文本] \
[--image 路径或URL] \
[--audio 路径或URL] \
[--duration 5] \
[--aspect-ratio 16:9] \
[--input {key: value}] # 完整 JSON 覆盖
示例:
bash
Nano Banana Pro 图像生成
$SKILLDIR/run.sh callendpoint --endpoint nano-banana-pro --prompt 太空中的金毛犬
Nano Banana Pro 图像编辑
$SKILLDIR/run.sh callendpoint --endpoint nano-banana-pro --prompt 变成夜晚 --image photo.jpg
Sora 2 Pro 图像转视频
$SKILLDIR/run.sh callendpoint --endpoint sora-2-pro-i2v --image photo.jpg --prompt 镜头缓慢拉远 --duration 5
Kokoro TTS
$SKILLDIR/run.sh callendpoint --endpoint kokoro-tts --text 你好世界
FLUX Schnell
$SKILLDIR/run.sh callendpoint --endpoint flux-schnell --prompt 赛博朋克城市 --input {width:1024,height:1024}
添加新端点
当用户要求使用注册表中不存在的端点,或 runpod 技能揭示了一个新端点时:
- 1. 直接使用 --endpoint 调用——无需注册表条目
- 可选择将其添加到 scripts/endpoints.json 以供后续会话使用
配合 runpod 技能: 使用 runpod 技能浏览/发现 RunPod 中心上的端点 ID,然后将该 ID 传递给此处的 call_endpoint。
生成图像
bash
$SKILLDIR/run.sh generateimage \
--prompt 提示词 \
[--aspect-ratio 1:1|16:9|9:16|4:3|3:4] \
[--seed 42]
编辑图像
bash
$SKILLDIR/run.sh editimage \
--images 路径或URL [路径或URL ...] \
--prompt 编辑指令 \
[--aspect-ratio 1:1] \
[--seed 42]
- - 接受 1–5 张图像(本地路径或 URL)
- 本地文件通过 imgbb 自动上传(需要 secrets.json 中的 /imgbb/apiKey)
图像动画转视频
bash
$SKILLDIR/run.sh imageto_video \
--image 路径或URL \
--prompt 运动描述 \
[--model wan25|kling|seedance] \
[--duration 5|10] \
[--negative-prompt 文本]
模型:
- - wan25(默认)— WAN 2.5,~$0.026/5秒
- kling — Kling v2.1 Pro,$0.45/5秒(最高质量)
- seedance — Seedance 1.0 Pro,~$0.12/5秒
文本生成视频
bash
$SKILLDIR/run.sh textto_video \
--prompt 视频描述 \
[--model wan26|seedance] \
[--duration 5|10|15] \
[--size 1920x1080] \
[--negative-prompt 文本]
模型:
- - wan26(默认)— WAN 2.6,~$0.04/5秒
- seedance — Seedance 1.0 Pro,~$0.12/5秒
默认设置
- - 发送后删除 — 成功发送后始终删除本地文件。仅在用户明确要求时保留(保留它、保存它、--keep)。
- 标题 — 保持简短自然。除非用户询问,否则不要包含渲染时间或成本。示例:🦊 极光下的狐狸 而不是 🦊 狐狸 — 105秒渲染(~$0.026)。
向用户传递媒体
生成图像或视频后,务必通过用户的活跃频道将其传递给用户。
###