Heatmap Beautifier

Professional beautification tool for gene expression heatmaps, automatically adds clustering trees, color annotation tracks, and intelligently optimizes label layout.

Input Validation

This skill accepts: CSV files containing gene expression matrices (genes as rows, samples as columns) for heatmap generation and beautification.

If the user's request does not involve heatmap generation or gene expression visualization — for example, asking to perform differential expression analysis, run statistical tests, or generate other chart types — do not proceed. Instead respond:

"heatmap-beautifier is designed to generate and beautify gene expression heatmaps from expression matrix data. Your request appears to be outside this scope. Please provide a CSV expression matrix file, or use a more appropriate tool for your task."

Do not continue the workflow when the request is out of scope, missing the required input CSV, or would require unsupported assumptions. For missing inputs, state exactly which fields are missing.

Quick Check

CODEBLOCK0

When to Use

- Beautify gene expression heatmaps with clustering trees and annotation tracks
Generate publication-ready heatmap output (PDF, PNG, SVG) with optimized label layout
Add row/column annotation color bars to expression matrices
Standardize heatmap styling for manuscript figures

Workflow

1. Validate input — confirm the request is within scope before any processing.
Confirm the user objective, required inputs, and non-negotiable constraints.
Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.

Features

- Automatic Clustering: Adds row/column clustering trees based on hierarchical clustering
Annotation Tracks: Supports multiple color annotation tracks (sample grouping, gene classification, etc.)
Smart Labels: Automatically calculates optimal font size to avoid row/column label overlap
Flexible Color Schemes: Built-in multiple professional scientific research color schemes
Export Options: Supports PDF, PNG, SVG formats
Demo Mode: Run --demo to generate a synthetic 20×10 matrix without a real CSV

Dependencies

CODEBLOCK1

Usage

Basic Usage

CODEBLOCK2

Command Line Usage

CODEBLOCK3

Parameters

Parameter	Type	Default	Required	Description
INLINECODE1, INLINECODE2	string	-	Yes*	Path to input data file (CSV)
INLINECODE3

*One of --data-path or --demo is required.

Input Data Format

Expression Matrix (CSV)

CODEBLOCK4

- First column: Gene names (row index)
First row: Sample names (column names)
Data: Expression values (e.g., log2 fold change, TPM, FPKM)

Color Schemes

- "RdBu_r" — Red-Blue (classic differential expression)
INLINECODE22 — Yellow-Purple (continuous data)
INLINECODE23 — Red-Yellow-Blue
INLINECODE24 — Cool-Warm
INLINECODE25 — Seismic
INLINECODE26 — Blue-White-Red

Error Handling

- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
If scripts/main.py fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
Do not fabricate files, citations, data, search results, or execution outcomes.
Exception handling: The script uses except (pd.errors.ParserError, UnicodeDecodeError, ValueError) for CSV parsing errors — not bare except. If you see a bare except in an older version, report it.
Error propagation: FileNotFoundError and ValueError are caught in main() with try/except (FileNotFoundError, ValueError) as e: print(f'Error: {e}', file=sys.stderr); sys.exit(1) and reported to stderr with exit code 1.

Fallback Behavior

If scripts/main.py fails or required inputs are incomplete:

1. Report the exact failure point and error message.
State what can still be completed (e.g., data validation without rendering).
Manual fallback: verify CSV format has gene rows and sample columns, then re-run with minimal options: python -m skills.heatmap_beautifier.scripts.main --input data.csv --output out.png.
Use --demo to verify the environment works without a real CSV.
Do not fabricate execution outcomes or file contents.

Output Requirements

Every final response must make these items explicit when relevant:

- Objective or requested deliverable
Inputs used and assumptions introduced
Workflow or decision path
Core result, recommendation, or artifact
Constraints, risks, caveats, or validation needs
Unresolved items and next-step checks

Response Template

Use the following fixed structure for non-trivial requests:

1. Objective
Inputs Received
Assumptions
Workflow
Deliverable
Risks and Limits
Next Checks

For stress/multi-constraint requests, also include:

- Constraints checklist (compliance, performance, error paths)
Unresolved items with explicit blocking reasons

If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.

Notes

1. Recommended to perform log2 transformation or standardization on data first
Large datasets (>5000 rows) may take longer to process
When there are too many rows/columns, some labels will be automatically hidden
Default clustering uses Euclidean distance and Ward method

热图美化工具

基因表达热图的专业美化工具，自动添加聚类树、颜色注释轨道，并智能优化标签布局。

输入验证

本技能接受：包含基因表达矩阵（基因作为行，样本作为列）的CSV文件，用于热图生成和美化。

如果用户的请求不涉及热图生成或基因表达可视化——例如，要求进行差异表达分析、运行统计检验或生成其他图表类型——请勿继续。而是回复：

heatmap-beautifier旨在从表达矩阵数据生成和美化基因表达热图。您的请求似乎超出了此范围。请提供CSV表达矩阵文件，或使用更适合您任务的工具。

当请求超出范围、缺少所需的输入CSV或需要不支持的假设时，请勿继续工作流程。对于缺少的输入，请明确指出缺少哪些字段。

快速检查

bash
python -m py_compile scripts/main.py
python scripts/main.py --help

演示模式（无需CSV）：

python scripts/main.py --demo --output demo_heatmap.pdf

使用时机

- 使用聚类树和注释轨道美化基因表达热图
生成可发表的热图输出（PDF、PNG、SVG），具有优化的标签布局
为表达矩阵添加行/列注释颜色条
为手稿图表标准化热图样式

工作流程

1. 验证输入——在处理之前确认请求在范围内。
确认用户目标、所需输入和不可协商的约束条件。
使用打包的脚本路径或记录的推理路径，仅使用实际可用的输入。
返回结构化结果，区分假设、交付物、风险和未解决项。
如果执行失败或输入不完整，切换到备用路径并明确指出阻止完整完成的原因。

功能特性

- 自动聚类：基于层次聚类添加行/列聚类树
注释轨道：支持多种颜色注释轨道（样本分组、基因分类等）
智能标签：自动计算最佳字体大小，避免行/列标签重叠
灵活配色方案：内置多种专业科研配色方案
导出选项：支持PDF、PNG、SVG格式
演示模式：运行--demo生成合成的20×10矩阵，无需真实CSV

依赖项

text
pip install seaborn matplotlib scipy pandas numpy

使用方法

基本用法

python
from skills.heatmap_beautifier.scripts.main import HeatmapBeautifier

hb = HeatmapBeautifier()
hb.create_heatmap(
datapath=expressionmatrix.csv,
output_path=output/heatmap.pdf
)

命令行用法

text
python -m skills.heatmap_beautifier.scripts.main \
--input expression_matrix.csv \
--output heatmap.pdf

python -m skills.heatmap_beautifier.scripts.main \
--input expression_matrix.csv \
--output heatmap.pdf \
--row-cluster \
--col-cluster \
--row-annotations row_annot.json \
--col-annotations col_annot.json \
--title 基因表达

演示模式（无需CSV）

python -m skills.heatmapbeautifier.scripts.main --demo --output demoheatmap.pdf

将聚类元数据保存为JSON供代理使用

python -m skills.heatmap_beautifier.scripts.main \ --input expression_matrix.csv \ --output heatmap.pdf \ --output-json heatmap_metadata.json

参数

参数	类型	默认值	必需	描述
--data-path, -d	字符串	-	是*	输入数据文件路径（CSV）
--demo

标志 | - | 否 | 生成合成的20×10演示矩阵 | | --output-path, -o | 字符串 | heatmap.png | 否 | 输出文件路径 | | --title | 字符串 | 基因表达热图 | 否 | 热图标题 | | --cmap | 字符串 | RdBu_r | 否 | 颜色映射 | | --center | 浮点数 | 0 | 否 | 颜色中心值 | | --vmin | 浮点数 | -2 | 否 | 颜色标尺最小值 | | --vmax | 浮点数 | 2 | 否 | 颜色标尺最大值 | | --row-cluster | 布尔值 | true | 否 | 启用行聚类 | | --col-cluster | 布尔值 | true | 否 | 启用列聚类 | | --standard-scale | 字符串 | None | 否 | 标准化：row、col、None | | --z-score | 整数 | None | 否 | Z-score：0（行）、1（列）、None | | --figsize | 元组 | (12, 10) | 否 | 图形尺寸（宽度，高度） | | --dpi | 整数 | 300 | 否 | 分辨率（每英寸点数） | | --format | 字符串 | pdf | 否 | 输出格式（pdf、png、svg） | | --output-json | 字符串 | - | 否 | 将聚类元数据（geneorder、sampleorder、annotation_colors）保存为JSON |

*--data-path或--demo中必须提供一个。

输入数据格式

表达矩阵（CSV）

csv
,sample1,sample2,sample3,sample4
Gene_A,2.5,-1.2,0.8,-0.5
Gene_B,-0.8,1.5,-2.1,0.3
Gene_C,1.2,0.5,-0.7,1.8

- 第一列：基因名称（行索引）
第一行：样本名称（列名称）
数据：表达值（例如，log2倍数变化、TPM、FPKM）

配色方案

- RdBur — 红-蓝（经典差异表达）
viridis — 黄-紫（连续数据）
RdYlBur — 红-黄-蓝
coolwarm — 冷-暖
seismic — 地震色
bwr — 蓝-白-红

错误处理

- 如果缺少必需的输入，请明确指出缺少哪些字段，并仅请求最少量的额外信息。
如果任务超出文档记录的范围，请停止，而不是猜测或静默地扩大任务范围。
如果scripts/main.py失败，报告失败点，总结仍可安全完成的内容，并提供手动备用方案。
不要捏造文件、引用、数据、搜索结果或执行结果。
异常处理：脚本使用except (pd.errors.ParserError, UnicodeDecodeError, ValueError)处理CSV解析错误——而不是裸except。如果您在旧版本中看到裸except，请报告。
错误传播：FileNotFoundError和ValueError在main()中使用try/except (FileNotFoundError, ValueError) as e: print(fError: {e}, file=sys.stderr); sys.exit(1)捕获，并以退出码1报告到stderr。

备用行为

如果scripts/main.py失败或必需的输入不完整：

1. 报告确切的失败点和错误消息。
说明仍可完成的内容（例如，无需渲染的数据验证）。
手动备用：验证CSV格式具有基因行和样本列，然后使用最小选项重新运行：python -m skills.heatmap_beautifier.scripts.main --input data.csv --output out.png。
使用--demo验证环境在无真实CSV的情况下是否正常工作。
不要捏造执行结果或文件内容。

输出要求

每个最终响应在相关时必须明确以下项目：

- 目标或请求的交付物
使用的输入和引入的假设
工作流程或决策路径
核心结果、建议或产物
约束条件、风险、注意事项或验证需求
未解决项和后续检查

响应模板

对于非平凡请求，使用以下固定结构：

1. 目标
收到的输入
假设
工作流程
交付物
风险和限制
后续检查

对于压力/多约束请求，还需包括：

- 约束检查清单（合规性、性能、错误路径）
未解决项及明确的阻塞原因

如果请求简单，可以压缩结构，但在影响正确性时仍需明确说明假设和限制。

注意事项

1. 建议先对数据进行log2转换或标准化
大型数据集（>5000行）可能需要更长的处理时间
当行/列过多时，部分

heatmap-beautifier热图美化工具