Scientific Graph Interpreter
Interpret and explain scientific graphs, charts, and data visualizations for research publications, clinical presentations, and academic communications with precision and clarity.
Quick Start
CODEBLOCK0
Core Capabilities
1. Multi-Type Graph Analysis
CODEBLOCK1
Supported Graph Types:
| Graph Type | Common Use | Key Elements to Extract |
|---|
| Kaplan-Meier | Survival analysis | Median survival, HR, 95% CI, log-rank p |
| Forest Plot |
Meta-analysis | Effect size, CI, heterogeneity (I²), weights |
|
ROC Curve | Diagnostic accuracy | AUC, sensitivity, specificity, optimal cutoff |
|
Box Plot | Distribution comparison | Median, IQR, outliers, whiskers |
|
Scatter Plot | Correlation | R², p-value, trend line, outliers |
|
Bar Chart | Group comparisons | Means, SEM/SD, significance indicators |
|
Heatmap | Expression/omics | Scale, clustering, row/column annotations |
|
Volcano Plot | Differential analysis | Fold change, p-value, FDR threshold |
2. Statistical Interpretation
CODEBLOCK2
Statistical Reporting Standards:
CODEBLOCK3
3. Audience-Specific Explanations
CODEBLOCK4
Explanation Templates:
For Researchers:
"The Kaplan-Meier analysis demonstrates a statistically significant
survival advantage for the experimental arm (HR 0.72, 95% CI 0.58-0.89,
p=0.003). Median survival improved from 14.2 to 19.6 months.
The proportional hazards assumption was verified (p=0.42)."
For Clinicians:
"This trial shows patients on the new treatment lived about 5 months
longer on average compared to standard care. The 32% reduction in
death risk is significant and clinically meaningful. Consider this
option for eligible patients."
For Patients:
"The study found that people taking the new treatment lived longer
than those on standard treatment. About 1 in 3 patients benefited
from the new treatment. Side effects were manageable."
4. Figure Caption Generation
CODEBLOCK5
Caption Structure:
CODEBLOCK6
5. Critical Appraisal
CODEBLOCK7
Common Graph Pitfalls:
| Issue | Problem | Better Approach |
|---|
| Truncated y-axis | Exaggerates differences | Start at 0 or clearly indicate break |
| No error bars |
Hides variability | Include SD, SEM, or 95% CI |
| 3D effects | Distorts perception | Use 2D with clear labels |
| Dual y-axes | Confusing comparison | Separate graphs or normalized scale |
| p-hacking indicators | Multiple comparisons | Adjusted p-values, Bonferroni |
CLI Usage
CODEBLOCK8
Common Patterns
Pattern 1: Clinical Trial Primary Endpoint
CODEBLOCK9
Pattern 2: Meta-Analysis Forest Plot
CODEBLOCK10
Pattern 3: Diagnostic Accuracy ROC
CODEBLOCK11
Quality Checklist
Before Interpretation:
- - [ ] Graph type appropriate for data
- [ ] Axes clearly labeled with units
- [ ] Sample sizes indicated
- [ ] Statistical tests specified
- [ ] Confidence intervals present
During Interpretation:
- - [ ] Effect size calculated
- [ ] Clinical significance assessed
- [ ] Confidence intervals interpreted
- [ ] Limitations noted
- [ ] Generalizability considered
After Interpretation:
- - [ ] Explanation appropriate for audience
- [ ] Statistical terms explained
- [ ] Uncertainty communicated
- [ ] Actionable insights highlighted
Best Practices
Statistical Communication:
- - Always report confidence intervals with point estimates
- Distinguish statistical from clinical significance
- Note limitations and generalizability
- Avoid causal language in observational studies
Visual Analysis:
- - Check axis scales for distortion
- Note truncated axes or breaks
- Identify outliers and their impact
- Verify error bar representation (SD vs SEM)
Common Pitfalls
❌ Correlation = Causation: "X causes Y because they're correlated"
✅ Cautious Interpretation: "X is associated with Y; other factors may explain this"
❌ Overstating Significance: "Highly significant (p<0.001)" as meaning large effect
✅ Proper Framing: "Statistically significant but modest effect size (d=0.2)"
❌ Ignoring Confidence Intervals: Reporting point estimate only
✅ Interval Reporting: "Effect: 1.5 (95% CI: 0.9-2.4), suggesting uncertainty"
Skill ID: 209 |
Version: 1.0 |
License: MIT
科学图表解读器
精确清晰地解读和解释科研出版物、临床演示和学术交流中的科学图表、图表和数据可视化。
快速开始
python
from scripts.graph_interpreter import GraphInterpreter
interpreter = GraphInterpreter()
综合图表分析
analysis = interpreter.interpret(
image
path=figure1.png,
graph
type=kaplanmeier,
context=oncology
phase3trial,
audience=clinicians
)
print(analysis.statistical_summary)
print(analysis.clinical_significance)
print(analysis.suggested_caption)
核心能力
1. 多类型图表分析
python
analysis = interpreter.analyze(
graphtype=forestplot,
data={
studies: [研究A, 研究B, 研究C],
effect_sizes: [1.2, 0.8, 1.5],
confidence_intervals: [[1.0, 1.4], [0.6, 1.0], [1.2, 1.8]],
overall_effect: 1.15,
heterogeneity_p: 0.04
}
)
支持的图表类型:
| 图表类型 | 常见用途 | 需提取的关键要素 |
|---|
| Kaplan-Meier曲线 | 生存分析 | 中位生存期、HR、95% CI、log-rank p值 |
| 森林图 |
荟萃分析 | 效应量、CI、异质性(I²)、权重 |
|
ROC曲线 | 诊断准确性 | AUC、灵敏度、特异度、最佳截断值 |
|
箱线图 | 分布比较 | 中位数、IQR、异常值、须线 |
|
散点图 | 相关性 | R²、p值、趋势线、异常值 |
|
柱状图 | 组间比较 | 均值、SEM/SD、显著性标记 |
|
热图 | 表达组学 | 尺度、聚类、行列注释 |
|
火山图 | 差异分析 | 倍数变化、p值、FDR阈值 |
2. 统计解读
python
stats = interpreter.extract_statistics(
graph_data,
extract=[
p_values,
confidence_intervals,
effect_sizes,
sample_sizes,
statistical_tests
]
)
统计报告标准:
python
示例输出结构
{
primary_outcome: {
measure: 风险比,
value: 0.72,
ci_95: [0.58, 0.89],
p_value: 0.003,
interpretation: 风险降低32%
},
secondary_outcomes: [...],
significance_level: 0.05,
multiple
comparisonadjusted: True
}
3. 面向不同受众的解释
python
explanations = interpreter.generatemultiaudience(
analysis,
audiences=[researchers, clinicians, patients, policy_makers]
)
解释模板:
面向研究人员:
Kaplan-Meier分析显示实验组具有统计学显著性的生存优势
(HR 0.72, 95% CI 0.58-0.89, p=0.003)。中位生存期从14.2个月
提高到19.6个月。比例风险假设已得到验证(p=0.42)。
面向临床医生:
该试验显示,接受新治疗的患者平均比标准治疗的患者多存活约5个月。
死亡风险降低32%具有显著性和临床意义。建议符合条件的患者考虑此方案。
面向患者:
研究发现,接受新治疗的人比接受标准治疗的人活得更久。
大约每3名患者中就有1名从新治疗中获益。副作用可控。
4. 图注生成
python
caption = interpreter.generate_caption(
analysis,
style=journal, # 或 presentation, poster
word_limit=250,
include_statistics=True
)
图注结构:
图X. [简要标题]。 [展示内容:X轴显示...,Y轴显示...,
线条/柱状图代表...]。 [关键发现:A组显示...与B组相比...]。
[统计信息:HR 0.72 (95% CI 0.58-0.89), p=0.003]。
[结论:这表明...]。
5. 批判性评估
python
appraisal = interpreter.critical_appraisal(
graph_data,
check=[
appropriategraphtype,
axis_scaling,
errorbarspresent,
samplesizeadequate,
confounding_controlled,
generalizability
]
)
常见图表陷阱:
| 问题 | 影响 | 更好的方法 |
|---|
| Y轴截断 | 夸大差异 | 从0开始或明确标记断点 |
| 无误差条 |
隐藏变异性 | 包含SD、SEM或95% CI |
| 3D效果 | 扭曲感知 | 使用2D并清晰标注 |
| 双Y轴 | 混淆比较 | 分开图表或使用标准化尺度 |
| p值操纵标记 | 多重比较 | 调整p值、Bonferroni校正 |
CLI使用
bash
综合分析
python scripts/graph_interpreter.py \
--image survival_curve.png \
--type kaplan_meier \
--context phase
3oncology \
--audience clinicians \
--output analysis.json
生成发表图注
python scripts/graph_interpreter.py \
--image forest_plot.png \
--type forest_plot \
--generate caption \
--journal-style nature \
--word-limit 200
批量处理图表
python scripts/graph_interpreter.py \
--batch figures/ \
--output report.html \
--template comprehensive
常见模式
模式1:临床试验主要终点
python
分析生存曲线
analysis = interpreter.interpret(
graph
type=kaplanmeier,
primary
endpoint=overallsurvival,
treatment_arms=[Experimental, Control],
key
metrics=[medianos, hr, ci, p_value]
)
生成监管就绪摘要
regulatory
summary = interpreter.generateregulatory_summary(
analysis,
guideline=ICH_E3
)
模式2:荟萃分析森林图
python
解读荟萃分析
analysis = interpreter.interpret
forestplot(
studies=included_studies,
check_heterogeneity=True,
assess
publicationbias=True
)
生成GRADE评估
grade
rating = interpreter.generategrade_rating(analysis)
模式3:诊断准确性ROC
python
分析诊断测试
analysis = interpreter.interpret_roc(
curves=[Test A, Test B, Combined],
optimal_cutoffs=True,
clinical Utility=True
)
临床决策支持
decision
aid = interpreter.generatedecision_aid(analysis)
质量检查清单
解读前:
- - [ ] 图表类型适合数据
- [ ] 坐标轴清晰标注单位
- [ ] 样本量已标明
- [ ] 统计检验已指定
- [ ] 置信区间已呈现
解读中:
- - [ ] 效应量已计算
- [ ] 临床意义已评估
- [ ] 置信区间已解读
- [ ] 局限性已注明
- [ ] 泛化性已考虑
解读后:
- - [ ] 解释适合受众
- [ ] 统计术语已解释
- [ ] 不确定性已传达
- [ ] 可操作见解已突出
最佳实践
统计沟通:
- - 始终报告点估计的置信区间
- 区分统计显著性和临床显著性
- 注明局限性和泛化性
- 观察性研究中避免因果语言
视觉分析:
- - 检查坐标轴尺度是否存在扭曲
- 注意截断轴或断点
- 识别异常值及其影响
- 验证误差条表示方式(SD vs SEM)
常见陷阱
❌ 相关=因果:X导致Y,因为它们相关
✅ 谨慎解读:X与Y相关;其他因素可能解释这一现象
❌ 夸大显著性:高度显著(p<0.001)意味着效应量大
✅ 适当表述:统计显著但效应量适中(d=0.2)
❌ 忽略置信区间:仅报告点估计
✅ 区间报告:效应:1.5 (95% CI: 0.9-2.4),表明存在不确定性
技能ID: 209 |
版本: 1.0 |
许可证: MIT