Media Generation
Handle image generation, image editing, and short video generation through one workflow: choose the right modality, pass caller intent through to the provider, save outputs under tmp/images/ or tmp/videos/, and prefer the bundled helpers over ad-hoc one-off API calls.
Workflow decision
- - If the user wants a brand-new still image, use an image-generation model.
- If the user supplies an image or wants a specific existing image changed, use an image-edit workflow.
- If the user wants motion / a clip / a short video, use a video-generation model.
- If the request includes one or more reference images, use the helper that supports reference-image transport.
Standard workflow
- 1. Determine whether the task is image generation, image editing, or video generation.
- Clarify only when required to execute the request correctly.
- Prefer
scripts/generate_image.py for still-image generation. - Prefer
scripts/edit_image.py for direct image edits. - Prefer
scripts/mask_inpaint.py for localized edits with masks or generated regions. - Prefer
scripts/outpaint_image.py for canvas expansion / outpainting. - Prefer
scripts/reference_media.py when reference images need to be passed through. - Prefer
scripts/generate_video.py for video generation, especially when the provider may return async job payloads. - Prefer
scripts/generate_batch_media.py for repeatable batch jobs, templated variations, or auditable manifests. - Prefer
scripts/object_select_edit.py for simple object-vs-background edits on transparent assets or clean backdrops. - If the provider returns a URL, path, HTML snippet, markdown snippet,
data: URL, or b64_json, use scripts/fetch_generated_media.py. - Save outputs under:
- images →
tmp/images/
- videos →
tmp/videos/
- 13. If the user wants files sent in chat, prefer sending the local downloaded file.
- Keep the original remote reference as fallback when local retrieval fails.
Prompt handling
Default to prompt pass-through.
- - Pass the caller's prompt through unchanged.
- Use optional request fields only when the caller provides them.
- Keep prompt semantics under caller control.
Use the scripts mainly as functional helpers:
- - normalize arguments
- map fields to provider-specific JSON
- upload files
- poll async jobs
- download returned media
- save outputs under
tmp/images/ or INLINECODE16
Delivery rules
- - Save generated or edited images in
tmp/images/. - Save generated videos in
tmp/videos/. - Never scatter generated files in the workspace root.
- If message delivery blocks remote URLs, download locally first and then send the local file.
- If a remote file cannot be fetched locally but the raw link may still help, provide the original link clearly.
Helper quick guide
Use the smallest helper that matches the request:
- -
scripts/generate_image.py → direct still-image generation - INLINECODE20 → direct full-image edits
- INLINECODE21 → localized edits with an explicit or generated mask
- INLINECODE22 → canvas expansion before an edit call
- INLINECODE23 → reference-image transport and delegation
- INLINECODE24 → backward-compatible wrapper only
- INLINECODE25 → repeatable manifest-driven batches
- INLINECODE26 → simple object-vs-background edits on transparent or clean-backdrop assets
- INLINECODE27 → direct video generation and async polling
- INLINECODE28 → normalize returned media refs into local files
Use references/model-capabilities.md when deciding which helper fits the modality, transport, or return shape.
Use references/reference-image-workflow.md for reference-image transport details.
Use references/batch-workflows.md for manifest structure and batch execution behavior.
Minimal examples:
CODEBLOCK0
Quick compatibility checklist
Before blaming the skill, check these first:
- - config exists and is valid JSON
- INLINECODE32 exists
- the selected provider has both
baseUrl and INLINECODE34 - the chosen endpoint actually exists on that provider
- the chosen model name is valid for that endpoint
- any provider-specific fields passed through
--extra-json or --extra-json-file match that provider's schema
Defaults used by the bundled scripts:
- - config path:
~/.openclaw/openclaw.json or INLINECODE38 - default provider:
$OPENCLAW_MEDIA_PROVIDER, otherwise the first provider found in config - default model names: placeholders unless overridden by env vars or INLINECODE40
- image →
$OPENCLAW_MEDIA_IMAGE_MODEL or
image-model
- edit →
$OPENCLAW_MEDIA_EDIT_MODEL or
image-edit-model
- video →
$OPENCLAW_MEDIA_VIDEO_MODEL or
video-model
- - output root:
tmp/ or INLINECODE48 - output paths are resolved relative to the current working directory unless you pass an absolute INLINECODE49
Quick troubleshooting
Common failure patterns:
- -
provider not found → pass --provider explicitly or set INLINECODE52 - placeholder model warning (
image-model / image-edit-model / video-model) → pass --model explicitly or set the matching $OPENCLAW_MEDIA_*_MODEL env var config not found / invalid JSON → pass --config explicitly or fix the OpenClaw config file- HTTP 404 → check
--endpoint and video polling paths - HTTP 400 → check model name and provider-specific payload fields in
--extra-json / INLINECODE62 - HTTP 401/403 → check the provider INLINECODE63
- request failed before HTTP response → check base URL, proxy/TLS, or network reachability
- video accepted then failed later → check request payload, provider logs, or switch provider/model
Use --print-json when debugging so the response body, resolved endpoint, and failure hints stay visible.
References
Read these selectively:
- - helper selection, modality fit, transport notes, return-shape handling → INLINECODE65
- reference-image transport rules and compatibility notes → INLINECODE66
- manifest format, templating, and batch execution behavior → INLINECODE67
Primary helpers:
- - image generation → INLINECODE68
- image edit → INLINECODE69
- mask inpaint → INLINECODE70
- outpaint → INLINECODE71
- reference-image transport → INLINECODE72
- backward-compatible wrapper → INLINECODE73
- video generation → INLINECODE74
- batch generation → INLINECODE75
- object-select edit → INLINECODE76
- object mask prep → INLINECODE77
- shared request utility → INLINECODE78
- smoke tests → INLINECODE79
- media retrieval → INLINECODE80
媒体生成
通过单一工作流处理图像生成、图像编辑和短视频生成:选择合适的模态,将调用者意图传递给提供商,将输出保存到 tmp/images/ 或 tmp/videos/,优先使用内置辅助工具而非临时的一次性 API 调用。
工作流决策
- - 如果用户想要全新的静态图像,使用图像生成模型。
- 如果用户提供了图像或想要修改特定的现有图像,使用图像编辑工作流。
- 如果用户想要动态效果/片段/短视频,使用视频生成模型。
- 如果请求包含一张或多张参考图像,使用支持参考图像传输的辅助工具。
标准工作流
- 1. 确定任务是图像生成、图像编辑还是视频生成。
- 仅在正确执行请求需要时才进行澄清。
- 静态图像生成优先使用 scripts/generateimage.py。
- 直接图像编辑优先使用 scripts/editimage.py。
- 使用遮罩或生成区域进行局部编辑优先使用 scripts/maskinpaint.py。
- 画布扩展/外绘优先使用 scripts/outpaintimage.py。
- 需要传递参考图像时优先使用 scripts/referencemedia.py。
- 视频生成优先使用 scripts/generatevideo.py,特别是当提供商可能返回异步任务负载时。
- 可重复的批量任务、模板化变体或可审计清单优先使用 scripts/generatebatchmedia.py。
- 对透明素材或干净背景进行简单的对象与背景编辑优先使用 scripts/objectselectedit.py。
- 如果提供商返回 URL、路径、HTML 片段、Markdown 片段、data: URL 或 b64json,使用 scripts/fetchgenerated_media.py。
- 输出保存到:
- 图像 → tmp/images/
- 视频 → tmp/videos/
- 13. 如果用户想要在聊天中发送文件,优先发送本地下载的文件。
- 当本地检索失败时,保留原始远程引用作为备用。
提示词处理
默认采用提示词透传。
- - 原样传递调用者的提示词。
- 仅在调用者提供时才使用可选请求字段。
- 保持提示词语义由调用者控制。
主要将脚本用作功能辅助工具:
- - 规范化参数
- 将字段映射到提供商特定的 JSON
- 上传文件
- 轮询异步任务
- 下载返回的媒体文件
- 将输出保存到 tmp/images/ 或 tmp/videos/
交付规则
- - 将生成或编辑的图像保存在 tmp/images/ 中。
- 将生成的视频保存在 tmp/videos/ 中。
- 切勿将生成的文件散落在工作区根目录。
- 如果消息传递阻止远程 URL,先本地下载再发送本地文件。
- 如果无法本地获取远程文件但原始链接可能仍有帮助,清晰提供原始链接。
辅助工具快速指南
使用与请求匹配的最小辅助工具:
- - scripts/generateimage.py → 直接静态图像生成
- scripts/editimage.py → 直接全图编辑
- scripts/maskinpaint.py → 使用显式或生成的遮罩进行局部编辑
- scripts/outpaintimage.py → 编辑调用前的画布扩展
- scripts/referencemedia.py → 参考图像传输和委托
- scripts/generateconsistentmedia.py → 仅向后兼容的包装器
- scripts/generatebatchmedia.py → 可重复的清单驱动批量任务
- scripts/objectselectedit.py → 对透明或干净背景素材进行简单的对象与背景编辑
- scripts/generatevideo.py → 直接视频生成和异步轮询
- scripts/fetchgeneratedmedia.py → 将返回的媒体引用规范化为本地文件
在决定哪个辅助工具适合模态、传输或返回形状时,使用 references/model-capabilities.md。
参考图像传输细节使用 references/reference-image-workflow.md。
清单结构和批量执行行为使用 references/batch-workflows.md。
最小示例:
bash
python3 skills/media-generation/scripts/generate_image.py \
--prompt person \
--size 1024x1024 \
--out-dir tmp/images \
--prefix generated
python3 skills/media-generation/scripts/edit_image.py \
--image tmp/images/source.jpg \
--prompt replace the background \
--out-dir tmp/images \
--prefix edited
python3 skills/media-generation/scripts/mask_inpaint.py \
--image tmp/images/source.jpg \
--x 120 --y 80 --width 220 --height 180 \
--prompt replace the masked area \
--out-dir tmp/images \
--prefix mask-result
python3 skills/media-generation/scripts/outpaint_image.py \
--image tmp/images/source.jpg \
--left 512 --right 512 --top 128 --bottom 128 \
--mode blur \
--prompt extend outward \
--out-dir tmp/images \
--prefix outpaint-result
python3 skills/media-generation/scripts/reference_media.py \
--mode image \
--reference-image tmp/images/reference.png \
--prompt character \
--size 1024x1024 \
--out-dir tmp/images \
--prefix reference-output
python3 skills/media-generation/scripts/generatebatchmedia.py \
--manifest tmp/images/media-batch.jsonl \
--vars-json {subject:item} \
--summary-out tmp/images/media-batch-summary.json \
--continue-on-error \
--print-json
python3 skills/media-generation/scripts/objectselectedit.py \
--image tmp/images/product.png \
--selection-mode alpha \
--edit-target background \
--prompt replace the background \
--out-dir tmp/images \
--prefix product-bg-edit
python3 skills/media-generation/scripts/generate_video.py \
--prompt motion clip \
--size 720x1280 \
--seconds 6 \
--out-dir tmp/videos \
--prefix generated-video
快速兼容性检查清单
在归咎于技能之前,先检查以下内容:
- - 配置文件存在且为有效 JSON
- config.models.providers. 存在
- 所选提供商同时具有 baseUrl 和 apiKey
- 选择的端点在该提供商上实际存在
- 选择的模型名称对该端点有效
- 通过 --extra-json 或 --extra-json-file 传递的任何提供商特定字段与该提供商的模式匹配
内置脚本使用的默认值:
- - 配置文件路径:~/.openclaw/openclaw.json 或 $OPENCLAWCONFIG
- 默认提供商:$OPENCLAWMEDIA_PROVIDER,否则为配置中找到的第一个提供商
- 默认模型名称:占位符,除非被环境变量或 --model 覆盖
- 图像 → $OPENCLAW
MEDIAIMAGE_MODEL 或 image-model
- 编辑 → $OPENCLAW
MEDIAEDIT_MODEL 或 image-edit-model
- 视频 → $OPENCLAW
MEDIAVIDEO_MODEL 或 video-model
- - 输出根目录:tmp/ 或 $MEDIAGENERATIONOUTPUT_ROOT
- 输出路径相对于当前工作目录解析,除非传递绝对路径的 --out-dir
快速故障排除
常见失败模式:
- - provider not found → 显式传递 --provider 或设置 $OPENCLAWMEDIAPROVIDER
- 占位符模型警告(image-model / image-edit-model / video-model) → 显式传递 --model 或设置匹配的 $OPENCLAWMEDIA*_MODEL 环境变量
- config not found / 无效 JSON → 显式传递 --config 或修复 OpenClaw 配置文件
- HTTP 404 → 检查 --endpoint 和视频轮询路径
- HTTP 400 → 检查模型名称和 --extra-json / --extra-json-file 中的提供商特定负载字段
- HTTP 401/403 → 检查提供商的 apiKey
- HTTP 响应前请求失败 → 检查基础 URL、代理/TLS 或网络可达性
- 视频被接受后失败 → 检查请求负载、提供商日志,或切换提供商/模型
调试时使用 --print-json,以便响应体、解析的端点和失败提示保持可见。
参考资料