Image Generation Skill
This skill generates images using the best AI model for each use case. Model selection is the most important decision — read the dispatch logic carefully before generating.
🧠 Intelligent Dispatch Logic
Always select the model based on the user's actual need, not just the request surface.
Decision Tree
CODEBLOCK0
Model Capability Matrix
| Model | ID | Artistic | Photorealism | Text | Context Continuity | Speed | Cost |
|---|
| Midjourney | INLINECODE0 | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ | ❌ (no context) | ~30s | ~$0.05 |
| Nano Banana Pro |
nano-banana | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ~20s | $0.15 |
|
Flux Pro |
flux-pro | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ❌ | ~5s | ~$0.05 |
|
Flux Dev |
flux-dev | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ | ❌ | ~8s | ~$0.03 |
|
Flux Schnell |
flux-schnell | ⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ❌ | <2s | ~$0.003 |
|
Ideogram v3 |
ideogram | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ❌ | ~10s | ~$0.08 |
|
Recraft v3 |
recraft | ⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ | ❌ | ~8s | ~$0.04 |
|
SDXL Lightning |
sdxl | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ❌ | ~3s | ~$0.01 |
When to Use Nano Banana (Critical)
Use Nano Banana whenever the user's request involves:
- - Storyboard / 分镜图: Multiple frames that tell a story with the same characters
- Comic strip / 漫画: Sequential panels with consistent characters
- Character series: Multiple images of the same person/character in different poses or scenes
- Scene continuation: "Now show the same girl in the forest" (referencing a previous image)
- Style consistency: A set of images that must share the same visual style/world
Nano Banana uses Google's Gemini 3 Pro multimodal architecture, which understands context holistically rather than keyword-matching. It supports up to 14 reference images for maintaining character and scene consistency.
How to Use This Skill
- 1. Analyze the request: Is it a single image or a series? Does it need context continuity?
- Select model: Use the decision tree above.
- Enhance the prompt: Add style, lighting, and quality descriptors appropriate for the model.
- Inform the user: Tell them which model you're using and why, and that generation has started.
- Run the script: Use
exec tool with sufficient timeout. - Deliver the result: Send image URL(s) to the user.
Calling the Generation Script
CODEBLOCK1
Parameters:
- -
--model: One of midjourney, flux-pro, flux-dev, flux-schnell, sdxl, nano-banana, ideogram, INLINECODE17 - INLINECODE18 : The image generation prompt (required)
- INLINECODE19 : e.g.
16:9, 1:1, 9:16, 4:3, 3:4 (default: 1:1) - INLINECODE26 : 1-4 (default:
1; Midjourney always returns 4 regardless) - INLINECODE28 : Things to avoid (not supported by Midjourney)
- INLINECODE29 : Comma-separated image URLs for context/character consistency (Nano Banana only)
- INLINECODE30 : Midjourney speed:
turbo (default, ~20-40s), fast (~30-60s), relax (free but slow)
exec timeout: Set at least 120 seconds for Midjourney and Nano Banana; 30 seconds is sufficient for Flux Schnell.
⚡ Midjourney Workflow (Sync Mode — No --async)
Always use sync mode (no --async). The script waits internally until complete.
CODEBLOCK2
Understanding Midjourney Output
CODEBLOCK3
CRITICAL — image field meanings:
| Field | What it is | When to use |
|---|
| INLINECODE35 | A 2×2 grid composite of all 4 images | Send as preview so user can see all options |
| INLINECODE36 |
Image 1 (top-left) | Send when user wants image 1 |
|
imageUrls[1] | Image 2 (top-right) | Send when user wants image 2 |
|
imageUrls[2] | Image 3 (bottom-left) | Send when user wants image 3 |
|
imageUrls[3] | Image 4 (bottom-right) | Send when user wants image 4 |
"放大第N张" / "要第N张" / "give me image N" = send imageUrls[N-1] directly. Do NOT call generate.js again.
Midjourney Interaction Flow
After generation:
🎨 生成完成!这是 4 张图的预览:
预览图
你喜欢哪一张?回复 1、2、3 或 4,我直接发给你高清单图。
When user picks image N:
这是第 N 张的单独高清图:
图片 N
🤖 Nano Banana (Gemini) Workflow
Use for storyboards, character series, and any context-dependent multi-image generation.
Single image (no reference)
CODEBLOCK4
With reference images (character/scene consistency)
CODEBLOCK5
How to build a storyboard series:
- 1. Generate the first frame without reference images (establishes the character/scene)
- Use the first frame's URL as
--reference-images for the second frame - For subsequent frames, use the most recent 1-3 images as references to maintain consistency
- Keep the character description consistent across all prompts
Example storyboard workflow:
CODEBLOCK6
Nano Banana Output
{
"success": true,
"model": "nano-banana",
"images": ["https://v3b.fal.media/files/...png"],
"imageUrl": "https://v3b.fal.media/files/...png"
}
Send
imageUrl directly to the user (no grid, single image).
Other Models
Flux Pro / Dev / Schnell
Best for photorealistic standalone images. Output format same as Nano Banana (single
imageUrl).
CODEBLOCK8
Ideogram v3
Best for images containing text (logos, posters, signs).
CODEBLOCK9
Recraft v3
Best for vector-style, icons, flat design.
CODEBLOCK10
Prompt Enhancement Tips
For Midjourney: Add cinematic lighting, ultra detailed, --v 7, --style raw. Legnext supports all MJ parameters.
For Nano Banana: Use natural language descriptions. Describe the character consistently across frames (hair color, clothing, expression). Mention "same style as reference" or "consistent with previous frame".
For Flux: Add masterpiece, highly detailed, sharp focus, professional photography, 8k.
For Ideogram: Be explicit about text content, font style, layout, and color scheme.
For Recraft: Specify vector illustration, flat design, icon style, minimal.
Example Conversations
User: "帮我画一只赛博朋克猫"
→ Single artistic image → Midjourney
→ Tell user "🎨 正在用 Midjourney 生成,约 30 秒..."
→ Send grid preview, ask which one they want
User: "帮我生成一套分镜图,讲述一个女孩在魔法森林的冒险"
→ Multiple frames with story continuity → Nano Banana
→ Tell user "🎨 这类有上下文关联的分镜图用 Gemini 生成,能保持角色一致性..."
→ Generate frame by frame, using previous frames as reference images
User: "要第2张" / "放大第2张" (after Midjourney generation)
→ Send imageUrls[1] directly. No need to call generate.js again.
User: "做一个 App 图标,蓝色系扁平风格"
→ Vector/icon → Recraft
User: "生成一张带有'欢迎光临'文字的门牌图"
→ Text in image → Ideogram
User: "快速生成个草稿看看效果"
→ Speed priority → Flux Schnell (<2s)
User: "生成一张产品海报,白色背景,一瓶香水"
→ Photorealistic product → Flux Pro
Environment Variables
| Variable | Description |
|---|
| INLINECODE58 | fal.ai API key (for Flux, Nano Banana, Ideogram, Recraft) |
| INLINECODE59 |
Legnext.ai API key (for Midjourney) |
图像生成技能
本技能根据每个用例使用最佳AI模型生成图像。模型选择是最重要的决策 — 生成前请仔细阅读调度逻辑。
🧠 智能调度逻辑
始终根据用户的实际需求选择模型,而非仅看请求表面。
决策树
请求是否涉及共享角色、场景或故事连续性的多张图像?
├─ 是 → 使用 NANO BANANA (Gemini)
│ 原因:Gemini能整体理解上下文;支持reference_images
│ 以保持系列图像(分镜图、漫画、序列)中的角色/场景一致性
│
└─ 否 → 是否为单张独立图像?
├─ 艺术/电影感/绘画感/高度细节化?
│ → 使用 MIDJOURNEY
│
├─ 照片级真实感/肖像/产品照片?
│ → 使用 FLUX PRO
│
├─ 包含文字(标志、海报、标牌、信息图)?
│ → 使用 IDEOGRAM
│
├─ 矢量/图标/扁平设计/品牌素材?
│ → 使用 RECRAFT
│
├─ 快速草稿/快速迭代(速度优先)?
│ → 使用 FLUX SCHNELL (<2秒)
│
└─ 通用/平衡型?
→ 使用 FLUX DEV
模型能力矩阵
| 模型 | ID | 艺术性 | 照片真实感 | 文字 | 上下文连续性 | 速度 | 成本 |
|---|
| Midjourney | midjourney | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ | ❌(无上下文) | ~30秒 | ~$0.05 |
| Nano Banana Pro |
nano-banana | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ~20秒 | $0.15 |
|
Flux Pro | flux-pro | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ❌ | ~5秒 | ~$0.05 |
|
Flux Dev | flux-dev | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ | ❌ | ~8秒 | ~$0.03 |
|
Flux Schnell | flux-schnell | ⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ❌ | <2秒 | ~$0.003 |
|
Ideogram v3 | ideogram | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ❌ | ~10秒 | ~$0.08 |
|
Recraft v3 | recraft | ⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ | ❌ | ~8秒 | ~$0.04 |
|
SDXL Lightning | sdxl | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ❌ | ~3秒 | ~$0.01 |
何时使用Nano Banana(关键)
当用户的请求涉及以下内容时,使用 Nano Banana:
- - 分镜图 / ストーリーボード:用相同角色讲述故事的多个画面
- 漫画 / コミック:具有一致角色的连续画格
- 角色系列:同一人物/角色在不同姿势或场景中的多张图像
- 场景延续:现在展示同一个女孩在森林中(引用之前的图像)
- 风格一致性:必须共享相同视觉风格/世界的一组图像
Nano Banana使用Google Gemini 3 Pro多模态架构,能整体理解上下文而非关键词匹配。它支持最多14张参考图像以保持角色和场景一致性。
如何使用本技能
- 1. 分析请求:是单张图像还是系列图像?是否需要上下文连续性?
- 选择模型:使用上述决策树。
- 优化提示词:添加适合模型的风格、光照和质量描述词。
- 告知用户:告诉他们你正在使用哪个模型及原因,并告知生成已开始。
- 运行脚本:使用exec工具并设置足够的超时时间。
- 交付结果:将图像URL发送给用户。
调用生成脚本
bash
node {baseDir}/generate.js \
--model <模型ID> \
--prompt <优化后的提示词> \
[--aspect-ratio <比例>] \
[--num-images <1-4>] \
[--negative-prompt <负面提示词>] \
[--reference-images ]
参数说明:
- - --model:可选值 midjourney、flux-pro、flux-dev、flux-schnell、sdxl、nano-banana、ideogram、recraft
- --prompt:图像生成提示词(必填)
- --aspect-ratio:例如 16:9、1:1、9:16、4:3、3:4(默认:1:1)
- --num-images:1-4(默认:1;Midjourney始终返回4张)
- --negative-prompt:要避免的内容(Midjourney不支持)
- --reference-images:用于上下文/角色一致性的图像URL,以逗号分隔(仅Nano Banana)
- --mode:Midjourney速度模式:turbo(默认,约20-40秒)、fast(约30-60秒)、relax(免费但慢)
exec超时时间:Midjourney和Nano Banana至少设置 120秒;Flux Schnell 30秒足够。
⚡ Midjourney工作流程(同步模式 — 不使用--async)
始终使用同步模式(不使用--async)。脚本会在内部等待直到完成。
bash
node {baseDir}/generate.js \
--model midjourney \
--prompt <优化后的提示词> \
--aspect-ratio 16:9
理解Midjourney输出
json
{
success: true,
model: midjourney,
jobId: xxxxxxxx-...,
imageUrl: https://cdn.legnext.ai/temp/....png,
imageUrls: [
https://cdn.legnext.ai/mj/xxxx_0.png,
https://cdn.legnext.ai/mj/xxxx_1.png,
https://cdn.legnext.ai/mj/xxxx_2.png,
https://cdn.legnext.ai/mj/xxxx_3.png
]
}
关键 — image字段含义:
| 字段 | 含义 | 使用时机 |
|---|
| imageUrl | 所有4张图像的 2×2网格合成图 | 作为预览发送,让用户看到所有选项 |
| imageUrls[0] |
图像1(左上) | 用户想要图像1时发送 |
| imageUrls[1] | 图像2(右上) | 用户想要图像2时发送 |
| imageUrls[2] | 图像3(左下) | 用户想要图像3时发送 |
| imageUrls[3] | 图像4(右下) | 用户想要图像4时发送 |
放大第N张 / 要第N张 / give me image N = 直接发送 imageUrls[N-1]。不要再次调用generate.js。
Midjourney交互流程
生成后:
🎨 生成完成!这是4张图的预览:
预览图
你喜欢哪一张?回复1、2、3或4,我直接发给你高清单图。
当用户选择图像N时:
这是第N张的单独高清图:
图片N
🤖 Nano Banana (Gemini) 工作流程
用于分镜图、角色系列以及任何依赖上下文的多人图像生成。
单张图像(无参考)
bash
node {baseDir}/generate.js \
--model nano-banana \
--prompt <详细场景描述> \
--aspect-ratio 16:9
带参考图像(角色/场景一致性)
bash
node {baseDir}/generate.js \
--model nano-banana \
--prompt <场景描述,引用参考图像中的角色/风格> \
--aspect-ratio 16:9 \
--reference-images https://上一张图像url-1.png,https://上一张图像url-2.png
如何构建分镜图系列:
- 1. 生成第一帧时不使用参考图像(建立角色/场景)
- 使用第一帧的URL作为第二帧的--reference-images
- 对于后续帧,使用最近的