Image Generation
Use this skill when you need to create one or more image files from a text prompt, or edit one or more existing images with Gemini.
Requirements
- -
~/.openclaw/openclaw.json must include $.skills.entries["gemini-image-generation"].enabled set to true. - INLINECODE3 must include
$.skills.entries["gemini-image-generation"].env with the following keys and values: - INLINECODE5 required
- INLINECODE6 required
- INLINECODE7 optional
- - example
~/.openclaw/openclaw.json:
{
......,
"skills": {
"entries": {
"gemini-image-generation": {
"enabled": true,
"env": {
"GEMINI_API_KEY": "sk-xxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"GEMINI_MODEL_ID": "gemini-3.1-flash-image-preview",
"GEMINI_BASE_URL": "https://custom-endpoint.com"
}
}
}
},
......
}
- - Node.js must be installed in the workspace environment.
- Install dependencies once with
npm install from the skill root.
When To Use
- - The user asks to generate a new image from a text prompt.
- The user asks to modify, restyle, extend, or otherwise edit one or more existing images.
- The user wants the generated image saved to a workspace file.
- The task should be handled through a reusable OpenClaw skill instead of ad hoc SDK code.
Procedure
- 1. Convert the user request into a single clear image prompt.
- If the user supplied source images, choose or confirm the input file path or paths inside the workspace.
- If the user specified a target aspect ratio or size, pass them through as
--aspectRatio and --imageSize. - Choose an output path inside the workspace unless the user already provided one.
- For text-to-image, run generate-image.mjs with
--prompt, --output, and optional image config arguments. - For image editing, run edit-image.mjs with
--prompt, one or more --input values, --output, and optional image config arguments. - Read the api key from
GEMINI_API_KEY and the model ID from GEMINI_MODEL_ID in the environment. - Optionally, read the base URL from
GEMINI_BASE_URL in the environment for custom endpoints. - Return the saved image path or paths to the user.
- After returning each image path, also output
MEDIA:<image_path> (e.g. MEDIA:outputs/gemini-native-image.png) so the image is displayed inline in the conversation.
Commands
CODEBLOCK1
CODEBLOCK2
CODEBLOCK3
CODEBLOCK4
Notes
- - The script prints
TEXT: lines for model text and IMAGE: lines for each saved file. - After the skill finishes, always present every generated image to the user by outputting
MEDIA:<path> for each saved image path. This ensures the image is rendered inline in the conversation alongside the file path. - The final JSON summary only includes generated image paths and optional image config so prompts, model IDs, and source image paths are not echoed back into logs.
- Saved file extensions follow the returned image mime type. If the requested output path uses a different suffix, the scripts keep the base name and write the file with the returned type instead.
- If the model returns multiple images, the scripts save them as
name-1.png, name-2.png, and so on. - INLINECODE27 supports repeated
--input flags. You can also pass a comma-separated list to a single --input value. - INLINECODE30 infers the source mime type from
.png, .jpg, .jpeg, or .webp. Use one --mime-type for all inputs, or repeat --mime-type so it lines up with each --input. - Both scripts accept
--aspectRatio and --imageSize. They also accept the kebab-case forms --aspect-ratio and --image-size. - The scripts only send
config.imageConfig when at least one of those parameters is provided.
图像生成
当你需要根据文本提示创建一个或多个图像文件,或使用Gemini编辑一个或多个现有图像时,使用此技能。
要求
- - ~/.openclaw/openclaw.json 必须包含 $.skills.entries[gemini-image-generation].enabled 设置为 true。
- ~/.openclaw/openclaw.json 必须包含 $.skills.entries[gemini-image-generation].env,并包含以下键值对:
- GEMINIAPIKEY 必填
- GEMINIMODELID 必填
- GEMINIBASEURL 可选
- - 示例 ~/.openclaw/openclaw.json:
json
{
......,
skills: {
entries: {
gemini-image-generation: {
enabled: true,
env: {
GEMINI
APIKEY: sk-xxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx,
GEMINI
MODELID: gemini-3.1-flash-image-preview,
GEMINI
BASEURL: https://custom-endpoint.com
}
}
}
},
......
}
- - 工作区环境中必须安装 Node.js。
- 在技能根目录下使用 npm install 一次性安装依赖。
使用时机
- - 用户要求根据文本提示生成新图像。
- 用户要求修改、重新设计样式、扩展或以其他方式编辑一个或多个现有图像。
- 用户希望将生成的图像保存到工作区文件中。
- 任务应通过可复用的 OpenClaw 技能处理,而非临时编写的 SDK 代码。
操作步骤
- 1. 将用户请求转换为一个清晰的图像提示。
- 如果用户提供了源图像,选择或确认工作区内的输入文件路径。
- 如果用户指定了目标宽高比或尺寸,通过 --aspectRatio 和 --imageSize 传递。
- 除非用户已提供,否则在工作区内选择一个输出路径。
- 对于文生图,使用 --prompt、--output 和可选的图像配置参数运行 generate-image.mjs。
- 对于图像编辑,使用 --prompt、一个或多个 --input 值、--output 和可选的图像配置参数运行 edit-image.mjs。
- 从环境变量中读取 GEMINIAPIKEY 中的 API 密钥和 GEMINIMODELID 中的模型 ID。
- 可选地,从环境变量中的 GEMINIBASEURL 读取自定义端点的基础 URL。
- 将保存的图像路径返回给用户。
- 返回每个图像路径后,同时输出 MEDIA:(例如 MEDIA:outputs/gemini-native-image.png),以便图像在对话中内联显示。
命令
powershell
node ./skills/gemini-image-generation/scripts/generate-image.mjs --prompt Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme --output outputs/gemini-native-image.png
powershell
node ./skills/gemini-image-generation/scripts/generate-image.mjs --prompt Create a wide cinematic food photo of a nano banana dish in a fancy restaurant with a Gemini theme --output outputs/gemini-wide.png --aspectRatio 16:9 --imageSize 2K
powershell
node ./skills/gemini-image-generation/scripts/edit-image.mjs --prompt Turn this cat into a watercolor illustration eating a nano-banana in a fancy restaurant under the Gemini constellation --input inputs/cat.png --output outputs/cat-watercolor.png --aspectRatio 5:4 --imageSize 2K
powershell
node ./skills/gemini-image-generation/scripts/edit-image.mjs --prompt Create an office group photo of these people making funny faces --input inputs/person-1.jpg --input inputs/person-2.jpg --input inputs/person-3.jpg --output outputs/group-photo.png
注意事项
- - 脚本会输出 TEXT: 行用于模型文本,IMAGE: 行用于每个保存的文件。
- 技能完成后,始终通过为每个保存的图像路径输出 MEDIA: 来向用户展示所有生成的图像。这确保图像在对话中与文件路径一起内联渲染。
- 最终的 JSON 摘要仅包含生成的图像路径和可选的图像配置,因此提示、模型 ID 和源图像路径不会回显到日志中。
- 保存的文件扩展名遵循返回的图像 MIME 类型。如果请求的输出路径使用不同的后缀,脚本会保留基本名称,并使用返回的类型写入文件。
- 如果模型返回多个图像,脚本会将其保存为 name-1.png、name-2.png 等。
- edit-image.mjs 支持重复的 --input 标志。你也可以将逗号分隔的列表传递给单个 --input 值。
- edit-image.mjs 从 .png、.jpg、.jpeg 或 .webp 推断源 MIME 类型。对所有输入使用一个 --mime-type,或重复 --mime-type 使其与每个 --input 对应。
- 两个脚本都接受 --aspectRatio 和 --imageSize。它们也接受短横线命名形式 --aspect-ratio 和 --image-size。
- 脚本仅在至少提供其中一个参数时发送 config.imageConfig。