PaperBanana

Generate publication-quality academic diagrams and pipeline figures from a paper's methodology section and figure caption. PaperBanana orchestrates a multi-agent pipeline (Retriever, Planner, Stylist, Visualizer, Critic) to produce camera-ready figures suitable for venues like NeurIPS, ICML, and ACL.

Environment Setup

CODEBLOCK0

Set your API key via environment variable or in configs/model_config.yaml.

Option 1 (Recommended): OpenRouter API key — one key for both text reasoning and image generation:
CODEBLOCK1

Option 2: Google API key — direct access to Gemini API:
CODEBLOCK2

If both keys are configured, OpenRouter is used by default.

Usage

CODEBLOCK3

Parameters

Parameter	Required	Default	Description
INLINECODE1	Yes		Method section text to visualize
INLINECODE2

*One of --content or --content-file is required.

When --num-candidates > 1, output files are named <stem>_0.png, <stem>_1.png, etc.

Output

The absolute path of each saved image is printed to stdout, one per line.

Examples

Diagram

CODEBLOCK4

Important Notes

- Runtime: A single candidate typically takes 3-10 minutes depending on model and network conditions. With the default 10 candidates running in parallel, expect ~10-30 minutes total. Plan accordingly.
API calls: Each candidate involves multiple LLM calls (Retriever + Planner + Stylist + Visualizer + up to 3 Critic rounds). Candidates run in parallel for efficiency.
Image generation: The Visualizer agent calls an image generation model (Gemini Image) to render diagrams.

About

PaperBanana is based on the PaperVizAgent framework, a reference-driven multi-agent system for automated academic illustration. It was developed as part of the research paper:

PaperBanana: Automating Academic Illustration for AI Scientists
Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon
arXiv:2601.23265

The framework introduces a collaborative team of five specialized agents — Retriever, Planner, Stylist, Visualizer, and Critic — to transform raw scientific content into publication-quality diagrams. Evaluation is conducted on the PaperBananaBench benchmark.

PaperBanana

根据论文的方法部分和图表标题，生成达到发表质量的学术图表和流程图。PaperBanana 编排了一个多智能体流水线（检索器、规划器、风格设计器、可视化器、评审器），以生成适用于 NeurIPS、ICML 和 ACL 等会议的可直接发表的图表。

环境设置

bash
cd <仓库根目录>
uv pip install -r requirements.txt

通过环境变量或在 configs/model_config.yaml 中设置您的 API 密钥。

选项 1（推荐）：OpenRouter API 密钥 — 一个密钥即可用于文本推理和图像生成：
bash
export OPENROUTERAPIKEY=sk-or-v1-...

选项 2：Google API 密钥 — 直接访问 Gemini API：
bash
export GOOGLEAPIKEY=your-key-here

如果两个密钥都已配置，默认使用 OpenRouter。

使用方法

bash
python skill/run.py \
--content 方法文本 \
--caption 图表标题 \
--task diagram \
--output output.png

参数

参数	必需	默认值	描述
--content	是		需要可视化的方法部分文本
--content-file

*--content 或 --content-file 中必须提供一个。

当 --num-candidates > 1 时，输出文件命名为 0.png、1.png 等。

输出

每个保存图像的绝对路径会打印到标准输出，每行一个。

示例

图表

bash
python skill/run.py \
--content 我们提出了一种基于 Transformer 的编码器-解码器架构。编码器由 12 个带有残差连接的自注意力层组成。解码器使用交叉注意力来关注编码器输出，并自回归地生成目标序列。 \
--caption 图 1：所提出的 Transformer 架构概览 \
--task diagram \
--output architecture.png

重要说明

- 运行时间：单个候选通常需要 3-10 分钟，具体取决于模型和网络条件。默认情况下并行运行 10 个候选，预计总共需要约 10-30 分钟。请相应规划时间。
API 调用：每个候选涉及多次 LLM 调用（检索器 + 规划器 + 风格设计器 + 可视化器 + 最多 3 轮评审）。候选并行运行以提高效率。
图像生成：可视化器智能体调用图像生成模型（Gemini Image）来渲染图表。

关于

PaperBanana 基于 PaperVizAgent 框架，这是一个用于自动化学术插图的参考驱动多智能体系统。它是以下研究论文的一部分：

PaperBanana: Automating Academic Illustration for AI Scientists
Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon
arXiv:2601.23265

该框架引入了一个由五个专业智能体组成的协作团队——检索器、规划器、风格设计器、可视化器和评审器——将原始科学内容转化为达到发表质量的图表。评估在 PaperBananaBench 基准上进行。

paperbanana论文图表生成

paperbanana

PaperBanana

Environment Setup

Usage

Parameters

Output

Examples

Diagram

Important Notes

About

PaperBanana

环境设置

使用方法

参数

输出

示例

图表

重要说明

关于

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

paperbanana论文图表生成

paperbanana

PaperBanana

Environment Setup

Usage

Parameters

Output

Examples

Diagram

Important Notes

About

PaperBanana

环境设置

使用方法

参数

输出

示例

图表

重要说明

关于

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement