Media Generation

Handle image generation, image editing, and short video generation through one workflow: choose the right modality, pass caller intent through to the provider, save outputs under tmp/images/ or tmp/videos/, and prefer the bundled helpers over ad-hoc one-off API calls.

Workflow decision

- If the user wants a brand-new still image, use an image-generation model.
If the user supplies an image or wants a specific existing image changed, use an image-edit workflow.
If the user wants motion / a clip / a short video, use a video-generation model.
If the request includes one or more reference images, use the helper that supports reference-image transport.

Standard workflow

1. Determine whether the task is image generation, image editing, or video generation.
Clarify only when required to execute the request correctly.
Prefer scripts/generate_image.py for still-image generation.
Prefer scripts/edit_image.py for direct image edits.
Prefer scripts/mask_inpaint.py for localized edits with masks or generated regions.
Prefer scripts/outpaint_image.py for canvas expansion / outpainting.
Prefer scripts/reference_media.py when reference images need to be passed through.
Prefer scripts/generate_video.py for video generation, especially when the provider may return async job payloads.
Prefer scripts/generate_batch_media.py for repeatable batch jobs, templated variations, or auditable manifests.
Prefer scripts/object_select_edit.py for simple object-vs-background edits on transparent assets or clean backdrops.
If the provider returns a URL, path, HTML snippet, markdown snippet, data: URL, or b64_json, use scripts/fetch_generated_media.py.
Save outputs under:

- images → tmp/images/ - videos → tmp/videos/

13. If the user wants files sent in chat, prefer sending the local downloaded file.
Keep the original remote reference as fallback when local retrieval fails.

Prompt handling

Default to prompt pass-through.

- Pass the caller's prompt through unchanged.
Use optional request fields only when the caller provides them.
Keep prompt semantics under caller control.

Use the scripts mainly as functional helpers:

- normalize arguments
map fields to provider-specific JSON
upload files
poll async jobs
download returned media
save outputs under tmp/images/ or INLINECODE16

Delivery rules

- Save generated or edited images in tmp/images/.
Save generated videos in tmp/videos/.
Never scatter generated files in the workspace root.
If message delivery blocks remote URLs, download locally first and then send the local file.
If a remote file cannot be fetched locally but the raw link may still help, provide the original link clearly.

Helper quick guide

Use the smallest helper that matches the request:

- scripts/generate_image.py → direct still-image generation
INLINECODE20 → direct full-image edits
INLINECODE21 → localized edits with an explicit or generated mask
INLINECODE22 → canvas expansion before an edit call
INLINECODE23 → reference-image transport and delegation
INLINECODE24 → backward-compatible wrapper only
INLINECODE25 → repeatable manifest-driven batches
INLINECODE26 → simple object-vs-background edits on transparent or clean-backdrop assets
INLINECODE27 → direct video generation and async polling
INLINECODE28 → normalize returned media refs into local files

Use references/model-capabilities.md when deciding which helper fits the modality, transport, or return shape.
Use references/reference-image-workflow.md for reference-image transport details.
Use references/batch-workflows.md for manifest structure and batch execution behavior.

Minimal examples:

CODEBLOCK0

Quick compatibility checklist

Before blaming the skill, check these first:

- config exists and is valid JSON
INLINECODE32 exists
the selected provider has both baseUrl and INLINECODE34
the chosen endpoint actually exists on that provider
the chosen model name is valid for that endpoint
any provider-specific fields passed through --extra-json or --extra-json-file match that provider's schema

Defaults used by the bundled scripts:

- config path: ~/.openclaw/openclaw.json or INLINECODE38
default provider: $OPENCLAW_MEDIA_PROVIDER, otherwise the first provider found in config
default model names: placeholders unless overridden by env vars or INLINECODE40

- image → $OPENCLAW_MEDIA_IMAGE_MODEL or image-model
- edit → $OPENCLAW_MEDIA_EDIT_MODEL or image-edit-model
- video → $OPENCLAW_MEDIA_VIDEO_MODEL or video-model

- output root: tmp/ or INLINECODE48
output paths are resolved relative to the current working directory unless you pass an absolute INLINECODE49

Quick troubleshooting

Common failure patterns:

- provider not found → pass --provider explicitly or set INLINECODE52
placeholder model warning (image-model / image-edit-model / video-model) → pass --model explicitly or set the matching $OPENCLAW_MEDIA_*_MODEL env var
config not found / invalid JSON → pass --config explicitly or fix the OpenClaw config file
HTTP 404 → check --endpoint and video polling paths
HTTP 400 → check model name and provider-specific payload fields in --extra-json / INLINECODE62
HTTP 401/403 → check the provider INLINECODE63
request failed before HTTP response → check base URL, proxy/TLS, or network reachability
video accepted then failed later → check request payload, provider logs, or switch provider/model

Use --print-json when debugging so the response body, resolved endpoint, and failure hints stay visible.

References

Read these selectively:

- helper selection, modality fit, transport notes, return-shape handling → INLINECODE65
reference-image transport rules and compatibility notes → INLINECODE66
manifest format, templating, and batch execution behavior → INLINECODE67

Primary helpers:

- image generation → INLINECODE68
image edit → INLINECODE69
mask inpaint → INLINECODE70
outpaint → INLINECODE71
reference-image transport → INLINECODE72
backward-compatible wrapper → INLINECODE73
video generation → INLINECODE74
batch generation → INLINECODE75
object-select edit → INLINECODE76
object mask prep → INLINECODE77
shared request utility → INLINECODE78
smoke tests → INLINECODE79
media retrieval → INLINECODE80

媒体生成

通过单一工作流处理图像生成、图像编辑和短视频生成：选择合适的模态，将调用者意图传递给提供商，将输出保存到 tmp/images/ 或 tmp/videos/，优先使用内置辅助工具而非临时的一次性 API 调用。

工作流决策

- 如果用户想要全新的静态图像，使用图像生成模型。
如果用户提供了图像或想要修改特定的现有图像，使用图像编辑工作流。
如果用户想要动态效果/片段/短视频，使用视频生成模型。
如果请求包含一张或多张参考图像，使用支持参考图像传输的辅助工具。

标准工作流

1. 确定任务是图像生成、图像编辑还是视频生成。
仅在正确执行请求需要时才进行澄清。
静态图像生成优先使用 scripts/generateimage.py。
直接图像编辑优先使用 scripts/editimage.py。
使用遮罩或生成区域进行局部编辑优先使用 scripts/maskinpaint.py。
画布扩展/外绘优先使用 scripts/outpaintimage.py。
需要传递参考图像时优先使用 scripts/referencemedia.py。
视频生成优先使用 scripts/generatevideo.py，特别是当提供商可能返回异步任务负载时。
可重复的批量任务、模板化变体或可审计清单优先使用 scripts/generatebatchmedia.py。
对透明素材或干净背景进行简单的对象与背景编辑优先使用 scripts/objectselectedit.py。
如果提供商返回 URL、路径、HTML 片段、Markdown 片段、data: URL 或 b64json，使用 scripts/fetchgenerated_media.py。
输出保存到：

- 图像 → tmp/images/ - 视频 → tmp/videos/

13. 如果用户想要在聊天中发送文件，优先发送本地下载的文件。
当本地检索失败时，保留原始远程引用作为备用。

提示词处理

默认采用提示词透传。

- 原样传递调用者的提示词。
仅在调用者提供时才使用可选请求字段。
保持提示词语义由调用者控制。

主要将脚本用作功能辅助工具：

- 规范化参数
将字段映射到提供商特定的 JSON
上传文件
轮询异步任务
下载返回的媒体文件
将输出保存到 tmp/images/ 或 tmp/videos/

交付规则

- 将生成或编辑的图像保存在 tmp/images/ 中。
将生成的视频保存在 tmp/videos/ 中。
切勿将生成的文件散落在工作区根目录。
如果消息传递阻止远程 URL，先本地下载再发送本地文件。
如果无法本地获取远程文件但原始链接可能仍有帮助，清晰提供原始链接。

辅助工具快速指南

使用与请求匹配的最小辅助工具：

- scripts/generateimage.py → 直接静态图像生成
scripts/editimage.py → 直接全图编辑
scripts/maskinpaint.py → 使用显式或生成的遮罩进行局部编辑
scripts/outpaintimage.py → 编辑调用前的画布扩展
scripts/referencemedia.py → 参考图像传输和委托
scripts/generateconsistentmedia.py → 仅向后兼容的包装器
scripts/generatebatchmedia.py → 可重复的清单驱动批量任务
scripts/objectselectedit.py → 对透明或干净背景素材进行简单的对象与背景编辑
scripts/generatevideo.py → 直接视频生成和异步轮询
scripts/fetchgeneratedmedia.py → 将返回的媒体引用规范化为本地文件

在决定哪个辅助工具适合模态、传输或返回形状时，使用 references/model-capabilities.md。
参考图像传输细节使用 references/reference-image-workflow.md。
清单结构和批量执行行为使用 references/batch-workflows.md。

最小示例：

bash
python3 skills/media-generation/scripts/generate_image.py \
--prompt person \
--size 1024x1024 \
--out-dir tmp/images \
--prefix generated

python3 skills/media-generation/scripts/edit_image.py \
--image tmp/images/source.jpg \
--prompt replace the background \
--out-dir tmp/images \
--prefix edited

python3 skills/media-generation/scripts/mask_inpaint.py \
--image tmp/images/source.jpg \
--x 120 --y 80 --width 220 --height 180 \
--prompt replace the masked area \
--out-dir tmp/images \
--prefix mask-result

python3 skills/media-generation/scripts/outpaint_image.py \
--image tmp/images/source.jpg \
--left 512 --right 512 --top 128 --bottom 128 \
--mode blur \
--prompt extend outward \
--out-dir tmp/images \
--prefix outpaint-result

python3 skills/media-generation/scripts/reference_media.py \
--mode image \
--reference-image tmp/images/reference.png \
--prompt character \
--size 1024x1024 \
--out-dir tmp/images \
--prefix reference-output

python3 skills/media-generation/scripts/generatebatchmedia.py \
--manifest tmp/images/media-batch.jsonl \
--vars-json {subject:item} \
--summary-out tmp/images/media-batch-summary.json \
--continue-on-error \
--print-json

python3 skills/media-generation/scripts/objectselectedit.py \
--image tmp/images/product.png \
--selection-mode alpha \
--edit-target background \
--prompt replace the background \
--out-dir tmp/images \
--prefix product-bg-edit

python3 skills/media-generation/scripts/generate_video.py \
--prompt motion clip \
--size 720x1280 \
--seconds 6 \
--out-dir tmp/videos \
--prefix generated-video

快速兼容性检查清单

在归咎于技能之前，先检查以下内容：

- 配置文件存在且为有效 JSON
config.models.providers. 存在
所选提供商同时具有 baseUrl 和 apiKey
选择的端点在该提供商上实际存在
选择的模型名称对该端点有效
通过 --extra-json 或 --extra-json-file 传递的任何提供商特定字段与该提供商的模式匹配

内置脚本使用的默认值：

- 配置文件路径：~/.openclaw/openclaw.json 或 $OPENCLAWCONFIG
默认提供商：$OPENCLAWMEDIA_PROVIDER，否则为配置中找到的第一个提供商
默认模型名称：占位符，除非被环境变量或 --model 覆盖

- 图像 → $OPENCLAWMEDIAIMAGE_MODEL 或 image-model
- 编辑 → $OPENCLAWMEDIAEDIT_MODEL 或 image-edit-model
- 视频 → $OPENCLAWMEDIAVIDEO_MODEL 或 video-model

- 输出根目录：tmp/ 或 $MEDIAGENERATIONOUTPUT_ROOT
输出路径相对于当前工作目录解析，除非传递绝对路径的 --out-dir

快速故障排除

常见失败模式：

- provider not found → 显式传递 --provider 或设置 $OPENCLAWMEDIAPROVIDER
占位符模型警告（image-model / image-edit-model / video-model） → 显式传递 --model 或设置匹配的 $OPENCLAWMEDIA*_MODEL 环境变量
config not found / 无效 JSON → 显式传递 --config 或修复 OpenClaw 配置文件
HTTP 404 → 检查 --endpoint 和视频轮询路径
HTTP 400 → 检查模型名称和 --extra-json / --extra-json-file 中的提供商特定负载字段
HTTP 401/403 → 检查提供商的 apiKey
HTTP 响应前请求失败 → 检查基础 URL、代理/TLS 或网络可达性
视频被接受后失败 → 检查请求负载、提供商日志，或切换提供商/模型

调试时使用 --print-json，以便响应体、解析的端点和失败提示保持可见。

media-generation媒体生成