Nanobanana Image Generation
Overview
This skill now supports two modes:
Gemini or Nanobanana generation and editing through the official
generateContent flow
Exact Python or matplotlib rendering of publication-style figures from numeric data
Use image mode for mechanism figures, graphical abstracts, device schematics, style-matched redraws, and diagram-first work.
Use plot mode for exact bar charts, trend curves, heatmaps, scatter plots, and multi-panel figures that must preserve numeric truth.
Runtime policy:
- - Python is the required runtime for this skill and the canonical path for both
image and plot workflows. - INLINECODE7 is an optional parity CLI for environments that already use Node.js, not the required runtime baseline for registry gating.
When the user is working in Codex and describes a plot in natural language, do not require them to hand-write a JSON spec. Codex should translate the request into an internal plot request or spec and run the plotting scripts.
For image mode, follow Google's official examples and replace:
- - API key with the provider key
- base URL with the chosen Google-compatible Gemini endpoint
Do not use OpenAI-style /images/generations or /images/edits routes for this skill.
Attachment-Only Inputs
If the image exists only as a chat attachment and the platform does not expose a local file path, do not claim the script can upload it directly.
Use this rule:
- 1. If the user needs an exact edit of the original uploaded pixels, ask for the local file path first.
- If the user accepts a close recreation, analyze the attached image visually and generate a new image that preserves the original composition and style as closely as possible.
For requests like "replace the English text in this attached image with Chinese", the fallback recreation workflow is acceptable when exact pixel-preserving edit is impossible.
Quick Start
Preflight:
- -
plot mode is local-only and does not require API credentials or outbound network access. - INLINECODE12 mode sends prompt text, API credentials, and any
--input-image files to the configured Gemini-compatible endpoint. - Prefer the official Google endpoint unless you intentionally trust another provider.
- If you use a third-party endpoint, require
--allow-third-party or NANOBANANA_ALLOW_THIRD_PARTY=1 and treat that as an explicit trust decision.
Set environment variables:
CODEBLOCK0
Optional third-party provider:
CODEBLOCK1
If you do not want the API key to appear in the command line, store it in a file and use:
CODEBLOCK2
Generate an image:
CODEBLOCK3
Edit an image:
CODEBLOCK4
Recreate an attached diagram with translated labels:
CODEBLOCK5
Safety note:
- -
scripts/build_materials_figure_prompt.py and --print-prompt are local-only and do not send data over the network. - Actual prompt text, API keys, and user-provided input images are sent only when you run the generation scripts against the configured provider.
- Non-official Gemini-compatible endpoints require explicit confirmation via
--allow-third-party or NANOBANANA_ALLOW_THIRD_PARTY=1. - Prefer
NANOBANANA_API_KEY_FILE over inline --api-key when you do not want the key to appear in shell history.
Workflow
Choose a mode first:
- 1. If the user supplied numeric data and needs exact plotting, use
plot mode.
Read
references/publication-plot-api.md and run
scripts/plot_publication_figure.py.
For natural-language requests, also read
references/natural-language-plot-workflow.md.
- 2. If the user needs a schematic, graphical abstract, or image editing workflow, use
image mode.
Follow the Gemini
generateContent flow below.
For image mode:
- 1. Keep the official Gemini request shape.
Use
POST /v1beta/models/{model}:generateContent with
X-goog-api-key.
- 2. Put prompt text and image inputs into
contents[].parts.
Text-only generation uses one text part. Image editing appends one or more inline image parts.
- 3. Put image options in
generationConfig.imageConfig.
Prefer
--aspect-ratio and
--image-size, matching the official docs.
- 4. For materials-science figures, prefer building the final prompt first.
Use
python3 scripts/build_materials_figure_prompt.py --materials-figure ... when you want to inspect or refine the prompt before sending any API request.
- 5. For publication-style research figures, load the bundled design guides as needed.
Read
references/publication-figure-design.md for house style, palette semantics, typography, and panel logic.
- 6. If the figure contains chart-like panels, read references/publication-chart-patterns.md.
Use those patterns to specify grouped bars, heatmaps, trend layouts, dedicated legends, and wide comparison panels.
- 7. Save image outputs from
candidates[0].content.parts[].inlineData.
Save text parts too when returned.
- 8. If the source image is attachment-only, choose between exact edit and recreation.
Ask for a local path for exact editing. Use recreation if the user wants the result and accepts a visually matched redraw.
For plot mode:
- 1. Read references/publication-plot-api.md.
- If the user is speaking naturally, infer the plotting intent and data structure.
Do not ask the user to author the internal spec unless they explicitly want low-level control.
- 3. For concise internal translation, optionally create a request JSON and expand it with
scripts/build_plot_spec.py. - Build or generate a JSON spec with top-level
style, layout, and panels. - Use
bar, trend, heatmap, scatter, legend, or empty panels. - Render with:
CODEBLOCK6
- 7. Export exact PNG, PDF, or SVG outputs.
Environment
Required:
- - INLINECODE46
- INLINECODE47
Must be set explicitly. Official Google endpoint: INLINECODE48
Optional:
Default:
gemini-3.1-flash-image-preview
Default:
120
Path to a file containing the API key. Prefer this when you do not want the key shown in command history or command logs.
Set to
1 only when you intentionally want to send API keys and user-provided files to a non-official Gemini-compatible provider.
Scripts
Python CLI that follows the official Gemini
generateContent request shape.
Node.js CLI with the same request format.
Python CLI for exact publication-style plotting from JSON specs.
Python CLI that expands a concise request JSON into a full plotting spec.
Common options:
- - INLINECODE61
- INLINECODE62
- INLINECODE63
- INLINECODE64
- INLINECODE65
- INLINECODE66
- INLINECODE67
- INLINECODE68
- INLINECODE69
- INLINECODE70
- INLINECODE71
- INLINECODE72
- INLINECODE73
Default output location:
- -
./output/nanobanana/ relative to the current Codex working directory - Override only when the user explicitly wants another folder
Deterministic plotting:
CODEBLOCK7
Natural-language-friendly internal workflow:
CODEBLOCK8
Official Mapping
Official Google examples:
- - INLINECODE75
- INLINECODE76
Third-party provider replacements:
- - INLINECODE77
- INLINECODE78
- INLINECODE79
Optional Zhizengzeng example:
- - INLINECODE80
- INLINECODE81
- INLINECODE82
Everything else should stay aligned with the official Gemini documentation.
Prompting Rules
- - For generation, describe the scene instead of dumping keywords.
- For editing, explicitly say what must stay unchanged.
- For multi-image workflows, describe the role of each reference image.
- Prefer English or
zh-CN prompts when image fidelity matters. - For attachment-only translation tasks, list each label that must be rewritten so the regenerated image does not miss text.
- If layout fidelity matters, explicitly say to preserve icon positions, arrows, spacing, hierarchy, and reading order.
- For publication figures, specify semantic color roles, panel order, arrow logic, and which elements should stay neutral.
- Keep figure text short. Prefer concise labels and legend entries over paragraph-like annotations baked into the image.
- If the figure resembles a plot, say whether it is a conceptual chart, a style-matched redraw, or an exact quantitative reproduction.
Materials Science Figure Shortcut
If the user asks for a materials-science paper figure, journal-style scientific schematic, graphical abstract, mechanism diagram, synthesis workflow figure, microstructure-property diagram, device architecture figure, or characterization-plan figure, use the bundled materials-science templates instead of writing the prompt from scratch.
Workflow:
- 1. Read references/materials-science-figure-template.md.
- Pick the closest subtype:
-
graphical-abstract
-
mechanism-figure
-
device-architecture
-
processing-workflow
- 3. Choose the output language:
-
en
-
zh
- 4. Insert the user's scientific content into the
Scientific Background slot, or use the script shortcut directly. - Preserve the template's constraints about causality, palette, typography, layout, and avoiding unsupported claims.
- If the user did not provide exact numbers, keep labels qualitative or explicitly use placeholders rather than fabricating data.
- If the user wants a specific journal style, append that preference after the template rather than rewriting the template.
- If the scientific background is long, put it in a markdown file and use
--prompt-file or scripts/build_materials_figure_prompt.py --background-file ... instead of squeezing it into one shell argument. - For prompt refinement, consult:
-
references/materials-science-figure-template.md
-
references/publication-figure-design.md
-
references/publication-chart-patterns.md
Research Figure Design Integration
This skill includes a distilled publication-figure playbook adapted from the figures4papers project. Use it to make Nanobanana outputs look like journal figures rather than generic AI art.
Read the reference files only as needed:
Use for overall figure art direction: typography, palette semantics, panel hierarchy, white-background policy, legend handling, and print-safe simplification.
Use when the figure contains bars, trend lines, heatmaps, comparison matrices, or dedicated legend panels.
Apply these rules when prompting:
- - Keep the overall composition minimal, high-contrast, and panel-driven.
- Use blue for the primary mechanism or proposed method, green for improvements, red for contrasts, and neutral gray for scaffolds/background categories.
- Ask for short professional labels, frameless legends, and uncluttered white backgrounds.
- Preserve consistent visual encoding across panels so the same color always means the same phase, state, or method.
- For chart-like figures, ask the model to mimic publication layout and styling, but do not imply exact quantitative correctness unless the figure is being recreated from provided source data or reference images.
Quantitative Boundary
This skill is strong for:
- - graphical abstracts
- mechanism figures
- device schematics
- processing workflows
- chart-like conceptual panels
- style-matched redraws of existing paper figures
This skill is not a guarantee of exact quantitative plotting. If the user needs exact bar heights, exact heatmap values, or faithful axis tick math from raw numbers, treat Nanobanana as a layout or visual-direction tool unless the request is explicitly a redraw from a trusted reference image.
For exact plotting, switch to plot mode and use references/publication-plot-api.md plus scripts/plot_publication_figure.py.
Python shortcut:
CODEBLOCK9
JavaScript shortcut:
CODEBLOCK10
Prompt-only preflight:
CODEBLOCK11
Failure Handling
- - If the API returns
401 or 403, verify NANOBANANA_API_KEY. - If the CLI says the base URL is missing, set
NANOBANANA_BASE_URL or pass --base-url. - If the CLI refuses a non-official endpoint, add
--allow-third-party or set NANOBANANA_ALLOW_THIRD_PARTY=1 only if that provider is intentional. - If the API returns
404, verify that the request is going to /v1beta/models/{model}:generateContent. - If the provider says the model does not exist, verify the exact model name in the official docs and the provider's supported model list.
- If no image is returned, inspect
candidates[0].content.parts and check whether the request asked for image output. - If the user supplied only a chat attachment and no file path, do not describe the result as an exact edit unless the platform actually exposed the attachment bytes.
References
Nanobanana 图像生成
概述
该技能现在支持两种模式:
通过官方 generateContent 流程进行 Gemini 或 Nanobanana 生成与编辑
基于数值数据,精确渲染出版级图表的 Python 或 matplotlib 绘图
使用 image 模式处理机理图、图文摘要、设备示意图、风格匹配重绘以及以图表为先的工作。
使用 plot 模式处理必须保留数值真实性的精确柱状图、趋势曲线、热力图、散点图和多面板图。
运行时策略:
- - Python 是该技能的必需运行时,也是 image 和 plot 两种工作流的规范路径。
- scripts/generate_image.js 是一个可选的等效 CLI,适用于已使用 Node.js 的环境,但不是注册表门控所需的运行时基线。
当用户在 Codex 中以自然语言描述图表时,不要要求他们手动编写 JSON 规范。Codex 应将请求转换为内部绘图请求或规范,并运行绘图脚本。
对于 image 模式,请遵循 Google 官方示例并替换:
- - API 密钥为提供商密钥
- 基础 URL 为所选的 Google 兼容 Gemini 端点
此技能不要使用 OpenAI 风格的 /images/generations 或 /images/edits 路由。
仅附件输入
如果图像仅作为聊天附件存在,且平台未暴露本地文件路径,则不要声称脚本可以直接上传该附件。
请使用以下规则:
- 1. 如果用户需要对原始上传像素进行精确编辑,请先询问本地文件路径。
- 如果用户接受近似重建,请视觉分析附件图像,并生成尽可能保留原始构图和风格的新图像。
对于诸如“将此附件图像中的英文文本替换为中文”之类的请求,当无法进行精确的像素级编辑时,可以采用后备的重建工作流。
快速开始
前置检查:
- - plot 模式仅限本地运行,不需要 API 凭据或出站网络访问。
- image 模式将提示文本、API 凭据以及任何 --input-image 文件发送到配置的 Gemini 兼容端点。
- 除非您有意信任其他提供商,否则请优先使用官方 Google 端点。
- 如果您使用第三方端点,需要 --allow-third-party 或 NANOBANANAALLOWTHIRD_PARTY=1,并将其视为明确的信任决策。
设置环境变量:
bash
export NANOBANANAAPIKEY=your-provider-key
export NANOBANANABASEURL=https://generativelanguage.googleapis.com
export NANOBANANA_MODEL=gemini-3.1-flash-image-preview
可选的第三方提供商:
bash
export NANOBANANABASEURL=https://api.zhizengzeng.com/google
export NANOBANANAALLOWTHIRD_PARTY=1
如果您不希望 API 密钥出现在命令行中,请将其存储在文件中并使用:
bash
export NANOBANANAAPIKEYFILE=$PWD/.secrets/nanobananaapi_key
生成图像:
bash
python3 scripts/generate_image.py Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme
编辑图像:
bash
python3 scripts/generate_image.py Using the provided image, change only the blue sofa to a vintage brown leather Chesterfield sofa. Keep everything else exactly the same. --input-image ./living-room.png
使用翻译后的标签重建附件图表:
bash
python3 scripts/generate_image.py Recreate the attached pastel technical diagram with the same layout, icons, arrows, and hand-drawn style. Replace all visible English labels with natural Simplified Chinese. Keep the composition unchanged. --aspect-ratio 16:9 --image-size 2K
安全说明:
- - scripts/buildmaterialsfigureprompt.py 和 --print-prompt 仅限本地运行,不会通过网络发送数据。
- 仅在您针对配置的提供商运行生成脚本时,才会发送实际的提示文本、API 密钥和用户提供的输入图像。
- 非官方的 Gemini 兼容端点需要通过 --allow-third-party 或 NANOBANANAALLOWTHIRDPARTY=1 进行明确确认。
- 当您不希望密钥出现在 shell 历史记录中时,请优先使用 NANOBANANAAPIKEY_FILE 而不是内联的 --api-key。
工作流
首先选择模式:
- 1. 如果用户提供了数值数据并需要精确绘图,请使用 plot 模式。
阅读
references/publication-plot-api.md 并运行 scripts/plotpublication_figure.py。
对于自然语言请求,还要阅读
references/natural-language-plot-workflow.md。
- 2. 如果用户需要示意图、图文摘要或图像编辑工作流,请使用 image 模式。
遵循下面的 Gemini generateContent 流程。
对于 image 模式:
- 1. 保持官方的 Gemini 请求格式。
使用带有 X-goog-api-key 的 POST /v1beta/models/{model}:generateContent。
- 2. 将提示文本和图像输入放入 contents[].parts 中。
纯文本生成使用一个文本部分。图像编辑会附加一个或多个内联图像部分。
- 3. 将图像选项放入 generationConfig.imageConfig 中。
优先使用 --aspect-ratio 和 --image-size,与官方文档保持一致。
- 4. 对于材料科学图表,优先构建最终提示。
当您希望在发送任何 API 请求之前检查或优化提示时,请使用 python3 scripts/build
materialsfigure_prompt.py --materials-figure ...。
- 5. 对于出版风格的研究图表,根据需要加载捆绑的设计指南。
阅读
references/publication-figure-design.md 了解内部风格、调色板语义、排版和面板逻辑。
- 6. 如果图表包含类似图表面板,请阅读 references/publication-chart-patterns.md。
使用这些模式来指定分组柱状图、热力图、趋势布局、专用图例和宽幅比较面板。
- 7. 从 candidates[0].content.parts[].inlineData 保存图像输出。
如果返回了文本部分,也一并保存。
- 8. 如果源图像仅为附件,请在精确编辑和重建之间做出选择。
对于精确编辑,请询问本地路径。如果用户想要结果并接受视觉匹配的重绘,则使用重建。
对于 plot 模式:
- 1. 阅读 references/publication-plot-api.md。
- 如果用户以自然语言表达,请推断绘图意图和数据结构。
除非用户明确想要底层控制,否则不要要求用户编写内部规范。
- 3. 为了简洁的内部转换,可以选择创建一个请求 JSON,并使用 scripts/buildplotspec.py 进行扩展。
- 构建或生成一个包含顶层 style、layout 和 panels 的 JSON 规范。
- 使用 bar、trend、heatmap、scatter、legend 或 empty 面板。
- 使用以下命令渲染:
bash
python3 skills/nanobanana-image-generation/scripts/plotpublicationfigure.py spec.json
- 7. 导出精确的 PNG、PDF 或 SVG 输出。
环境
必需:
- - NANOBANANAAPIKEY
- NANOBANANABASEURL
必须显式设置。官方 Google 端点:https://generativelanguage.googleapis.com
可选:
默认值:gemini-3.1-flash-image-preview
默认值:120
包含 API 密钥的文件路径。当您不希望密钥显示在命令历史记录或命令日志中时,请优先使用此选项。
- - NANOBANANAALLOWTHIRD_PARTY
仅当您有意将 API 密钥和用户提供的文件发送到非官方的 Gemini 兼容提供商时,才设置为 1。
脚本
- - scripts/generate_image.py
遵循官方 Gemini generateContent 请求格式的 Python CLI。
- - scripts/generate_image.js
具有相同请求格式的 Node.js CLI。
- - scripts/plotpublicationfigure.py
用于从 JSON 规范进行精确出版风格绘图的 Python CLI。
- - scripts/buildplotspec.py
将简洁的请求 JSON 扩展为完整绘图规范的 Python CLI。
常用选项:
- - --input-image ./source.png
- --prompt-file ./background.md
- --aspect-ratio 16:9
- --image-size 2K
- --text-only
- --thinking-level high
- --include-thoughts
- --materials