Image Generation (AI SDK)
Official API-based image generation. Supports OpenAI, Google, DashScope (阿里通义万象) and Replicate providers.
Script Directory
Agent Execution:
- 1.
SKILL_DIR = this SKILL.md file's directory - Script path = INLINECODE1
Step 0: Load Preferences ⛔ BLOCKING
CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer.
Check EXTEND.md existence (priority: project → user):
CODEBLOCK0
| Result | Action |
|---|
| Found | Load, parse, apply settings. If default_model.[provider] is null → ask model only (Flow 2) |
| Not found |
⛔ Run first-time setup (
references/config/first-time-setup.md) → Save EXTEND.md → Then continue |
CRITICAL: If not found, complete the full setup (provider + model + quality + save location) using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created.
| Path | Location |
|---|
| INLINECODE3 | Project directory |
| INLINECODE4 |
User home |
EXTEND.md Supports: Default provider | Default quality | Default aspect ratio | Default image size | Default models
Schema: INLINECODE5
Usage
CODEBLOCK1
Options
| Option | Description |
|---|
| INLINECODE6 , INLINECODE7 | Prompt text |
| INLINECODE8 |
Read prompt from files (concatenated) |
|
--image <path> | Output image path (required) |
|
--provider google\|openai\|dashscope\|replicate | Force provider (default: google) |
|
--model <id>,
-m | Model ID (Google:
gemini-3-pro-image-preview,
gemini-3.1-flash-image-preview; OpenAI:
gpt-image-1.5) |
|
--ar <ratio> | Aspect ratio (e.g.,
16:9,
1:1,
4:3) |
|
--size <WxH> | Size (e.g.,
1024x1024) |
|
--quality normal\|2k | Quality preset (default: 2k) |
|
--imageSize 1K\|2K\|4K | Image size for Google (default: from quality) |
|
--ref <files...> | Reference images. Supported by Google multimodal (
gemini-3-pro-image-preview,
gemini-3-flash-preview,
gemini-3.1-flash-image-preview) and OpenAI edits (GPT Image models). If provider omitted: Google first, then OpenAI |
|
--n <count> | Number of images |
|
--json | JSON output |
Environment Variables
| Variable | Description |
|---|
| INLINECODE30 | OpenAI API key |
| INLINECODE31 |
Google API key |
|
DASHSCOPE_API_KEY | DashScope API key (阿里云) |
|
REPLICATE_API_TOKEN | Replicate API token |
|
OPENAI_IMAGE_MODEL | OpenAI model override |
|
GOOGLE_IMAGE_MODEL | Google model override |
|
DASHSCOPE_IMAGE_MODEL | DashScope model override (default: z-image-turbo) |
|
REPLICATE_IMAGE_MODEL | Replicate model override (default: google/nano-banana-pro) |
|
OPENAI_BASE_URL | Custom OpenAI endpoint |
|
GOOGLE_BASE_URL | Custom Google endpoint |
|
DASHSCOPE_BASE_URL | Custom DashScope endpoint |
|
REPLICATE_BASE_URL | Custom Replicate endpoint |
Load Priority: CLI args > EXTEND.md > env vars > <cwd>/.baoyu-skills/.env > INLINECODE43
Replicate Model Configuration
When using --provider replicate, the model can be configured in the following ways (highest priority first):
- 1. CLI flag: INLINECODE45
- EXTEND.md: INLINECODE46
- Env var: INLINECODE47
- Built-in default: INLINECODE48
Supported model formats:
- -
owner/name (recommended for official models), e.g. INLINECODE50 - INLINECODE51 (community models by version), e.g. INLINECODE52
Examples:
CODEBLOCK2
Provider Selection
- 1.
--ref provided + no --provider → auto-select Google first, then OpenAI, then Replicate - INLINECODE55 specified → use it (if
--ref, must be google, openai, or replicate) - Only one API key available → use that provider
- Multiple available → default to Google
Quality Presets
| Preset | Google imageSize | OpenAI Size | Use Case |
|---|
| INLINECODE60 | 1K | 1024px | Quick previews |
| INLINECODE61 (default) |
2K | 2048px | Covers, illustrations, infographics |
Google imageSize: Can be overridden with INLINECODE62
Aspect Ratios
Supported: 1:1, 16:9, 9:16, 4:3, 3:4, INLINECODE68
- - Google multimodal: uses INLINECODE69
- Google Imagen: uses
aspectRatio parameter - OpenAI: maps to closest supported size
Generation Mode
Default: Sequential generation (one image at a time). This ensures stable output and easier debugging.
Parallel Generation: Only use when user explicitly requests parallel/concurrent generation.
| Mode | When to Use |
|---|
| Sequential (default) | Normal usage, single images, small batches |
| Parallel |
User explicitly requests, large batches (10+) |
Parallel Settings (when requested):
| Setting | Value |
|---|
| Recommended concurrency | 4 subagents |
| Max concurrency |
8 subagents |
| Use case | Large batch generation when user requests parallel |
Agent Implementation (parallel mode only):
CODEBLOCK3
Error Handling
- - Missing API key → error with setup instructions
- Generation failure → auto-retry once
- Invalid aspect ratio → warning, proceed with default
- Reference images with unsupported provider/model → error with fix hint (switch to Google multimodal:
gemini-3-pro-image-preview, gemini-3.1-flash-image-preview; or OpenAI GPT Image edits)
Extension Support
Custom configurations via EXTEND.md. See Preferences section for paths and supported options.
图像生成(AI SDK)
基于官方API的图像生成。支持OpenAI、Google、DashScope(阿里通义万象)和Replicate提供商。
脚本目录
代理执行:
- 1. SKILLDIR = 此SKILL.md文件所在目录
- 脚本路径 = ${SKILLDIR}/scripts/main.ts
第0步:加载偏好设置 ⛔ 阻塞
关键:此步骤必须在任何图像生成之前完成。请勿跳过或延迟。
检查EXTEND.md是否存在(优先级:项目 → 用户):
bash
test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo project
test -f $HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md && echo user
| 结果 | 操作 |
|---|
| 找到 | 加载、解析、应用设置。如果defaultmodel.[provider]为null → 仅询问模型(流程2) |
| 未找到 |
⛔ 运行首次设置(
references/config/first-time-setup.md)→ 保存EXTEND.md → 然后继续 |
关键:如果未找到,在生成任何图像之前,使用AskUserQuestion完成完整设置(提供商 + 模型 + 质量 + 保存位置)。在创建EXTEND.md之前,生成将被阻塞。
| 路径 | 位置 |
|---|
| .baoyu-skills/baoyu-image-gen/EXTEND.md | 项目目录 |
| $HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md |
用户主目录 |
EXTEND.md支持:默认提供商 | 默认质量 | 默认宽高比 | 默认图像尺寸 | 默认模型
模式:references/config/preferences-schema.md
用法
bash
基础
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt 一只猫 --image cat.png
带宽高比
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt 一幅风景 --image out.png --ar 16:9
高质量
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt 一只猫 --image out.png --quality 2k
从提示文件读取
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png
带参考图像(Google多模态或OpenAI编辑)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt 变成蓝色 --image out.png --ref source.png
带参考图像(显式提供商/模型)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt 变成蓝色 --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png
指定提供商
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt 一只猫 --image out.png --provider openai
DashScope(阿里通义万象)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt 一只可爱的猫 --image out.png --provider dashscope
Replicate(google/nano-banana-pro)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt 一只猫 --image out.png --provider replicate
带指定模型的Replicate
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt 一只猫 --image out.png --provider replicate --model google/nano-banana
选项
| 选项 | 描述 |
|---|
| --prompt <文本>, -p | 提示文本 |
| --promptfiles <文件...> |
从文件读取提示(拼接) |
| --image <路径> | 输出图像路径(必需) |
| --provider google\|openai\|dashscope\|replicate | 强制指定提供商(默认:google) |
| --model
, -m | 模型ID(Google:gemini-3-pro-image-preview,gemini-3.1-flash-image-preview;OpenAI:gpt-image-1.5) |
| --ar <比例> | 宽高比(例如:16:9,1:1,4:3) |
| --size <宽x高> | 尺寸(例如:1024x1024) |
| --quality normal\|2k | 质量预设(默认:2k) |
| --imageSize 1K\|2K\|4K | Google的图像尺寸(默认:来自质量设置) |
| --ref <文件...> | 参考图像。支持Google多模态(gemini-3-pro-image-preview,gemini-3-flash-preview,gemini-3.1-flash-image-preview)和OpenAI编辑(GPT图像模型)。如果省略提供商:优先Google,然后OpenAI |
| --n <数量> | 图像数量 |
| --json | JSON输出 |
环境变量
| 变量 | 描述 |
|---|
| OPENAIAPIKEY | OpenAI API密钥 |
| GOOGLEAPIKEY |
Google API密钥 |
| DASHSCOPEAPIKEY | DashScope API密钥(阿里云) |
| REPLICATEAPITOKEN | Replicate API令牌 |
| OPENAIIMAGEMODEL | OpenAI模型覆盖 |
| GOOGLEIMAGEMODEL | Google模型覆盖 |
| DASHSCOPEIMAGEMODEL | DashScope模型覆盖(默认:z-image-turbo) |
| REPLICATEIMAGEMODEL | Replicate模型覆盖(默认:google/nano-banana-pro) |
| OPENAIBASEURL | 自定义OpenAI端点 |
| GOOGLEBASEURL | 自定义Google端点 |
| DASHSCOPEBASEURL | 自定义DashScope端点 |
| REPLICATEBASEURL | 自定义Replicate端点 |
加载优先级:CLI参数 > EXTEND.md > 环境变量 > /.baoyu-skills/.env > ~/.baoyu-skills/.env
Replicate模型配置
使用--provider replicate时,可以通过以下方式配置模型(优先级从高到低):
- 1. CLI标志:--model <所有者/名称>
- EXTEND.md:defaultmodel.replicate
- 环境变量:REPLICATEIMAGE_MODEL
- 内置默认值:google/nano-banana-pro
支持的模型格式:
- - 所有者/名称(推荐用于官方模型),例如google/nano-banana-pro
- 所有者/名称:版本(社区模型按版本),例如stability-ai/sdxl:<版本>
示例:
bash
使用Replicate默认模型
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt 一只猫 --image out.png --provider replicate
显式覆盖模型
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt 一只猫 --image out.png --provider replicate --model google/nano-banana
提供商选择
- 1. 提供了--ref + 未指定--provider → 自动优先选择Google,然后OpenAI,最后Replicate
- 指定了--provider → 使用该提供商(如果使用--ref,必须是google、openai或replicate)
- 仅有一个API密钥可用 → 使用该提供商
- 多个可用 → 默认使用Google
质量预设
| 预设 | Google imageSize | OpenAI尺寸 | 使用场景 |
|---|
| normal | 1K | 1024px | 快速预览 |
| 2k(默认) |
2K | 2048px | 封面、插图、信息图 |
Google imageSize:可通过--imageSize 1K|2K|4K覆盖
宽高比
支持:1:1,16:9,9:16,4:3,3:4,2.35:1
- - Google多模态:使用imageConfig.aspectRatio
- Google Imagen:使用aspectRatio参数
- OpenAI:映射到最接近的受支持尺寸
生成模式
默认:顺序生成(一次生成一张图像)。这确保输出稳定且易于调试。
并行生成:仅在用户明确请求并行/并发生成时使用。
| 模式 | 使用时机 |
|---|
| 顺序(默认) | 正常使用,单张图像,小批量 |
| 并行 |