Document Workflow
Academic paper research: Search → Download LaTeX → Read & Summarize
Quick Start
1. Search Papers
CODEBLOCK0
2. Download LaTeX Source
CODEBLOCK1
3. Read & Summarize
CODEBLOCK2
Reading Guide
After downloading LaTeX source to arxiv_{id}/, read the .tex files in this order:
Step 1: Get Metadata
Read the
main .tex file (usually
main.tex,
root.tex, or
{paper-id}.tex) for:
- -
\title{} - Paper title - INLINECODE6 - Authors
- INLINECODE7 - Abstract
Step 2: Understand the Problem
Read the
Introduction section (usually
intro.tex,
1-introduction.tex, or first
\section):
- - What problem does this paper solve?
- What are the key contributions?
- How does it relate to prior work?
Step 3: Understand the Method
Read the
Method/Approach section:
- - What is the proposed approach?
- Key equations in
\begin{equation}...\end{equation} or INLINECODE12 - Algorithm pseudocode in INLINECODE13
Step 4: Check Experiments
Read the
Experiments section:
- - Datasets used
- Baselines compared
- Metrics in
\begin{table}...\end{table} with results - Key findings
Step 5: Check References
Read the .bib or .bbl file for:
- - Related work citations
- Key papers in the field
Output Schema
Summarize the paper in this JSON format(see more details in ./references/output_schema.json):
CODEBLOCK3
Scripts
| Script | Function |
|---|
| INLINECODE18 | Search papers (Tavily + Semantic Scholar) |
| INLINECODE19 |
Download PDF (for human reading) |
|
latex_reader.py | Download LaTeX source (for AI reading) |
Tips for Reading LaTeX
| LaTeX Command | Meaning |
|---|
| INLINECODE21 | Section heading |
| INLINECODE22 |
Subsection heading |
|
\textbf{text} | Bold text (often important) |
|
\cite{key} | Citation reference |
|
\begin{equation}...\end{equation} | Numbered equation |
|
\begin{table}...\end{table} | Table |
|
\begin{figure}...\end{figure} | Figure |
|
\input{file} or
\subfile{file} | Include another .tex file |
Config
CODEBLOCK4
文档工作流
学术论文研究:搜索 → 下载LaTeX → 阅读与总结
快速开始
1. 搜索论文
bash
python -m skills.document-workflow.scripts.search
papers --query 世界模型 --maxresults 5 --year_from 2024
2. 下载LaTeX源码
bash
python -m skills.document-workflow.scripts.latex_reader 2301.07088 --keep
3. 阅读与总结
阅读LaTeX源文件,并按照以下阅读指南进行总结。
阅读指南
将LaTeX源码下载到arxiv_{id}/后,按以下顺序阅读.tex文件:
第一步:获取元数据
阅读
主.tex文件(通常为main.tex、root.tex或{paper-id}.tex),查找:
- - \title{} - 论文标题
- \author{} - 作者
- \begin{abstract}...\end{abstract} - 摘要
第二步:理解问题
阅读
引言部分(通常为intro.tex、1-introduction.tex或第一个\section):
- - 本文解决了什么问题?
- 关键贡献是什么?
- 与先前工作的关系?
第三步:理解方法
阅读
方法/方案部分:
- - 提出的方法是什么?
- \begin{equation}...\end{equation}或\begin{align}...\end{align}中的关键公式
- \begin{algorithm}...\end{algorithm}中的算法伪代码
第四步:查看实验
阅读
实验部分:
- - 使用的数据集
- 对比的基线方法
- \begin{table}...\end{table}中的指标及结果
- 关键发现
第五步:查看参考文献
阅读.bib或.bbl文件,查找:
输出格式
按以下JSON格式总结论文(详见./references/output_schema.json):
json
{
paper_title: 完整标题,
authors: [作者1, 作者2],
source: arXiv:XXXX.XXXXX,
task_definition: {
domain: 研究领域,
task: 具体任务,
problem_statement: 本文解决的问题,
key_contributions: [贡献1, 贡献2]
},
experiments: {
datasets: [数据集1, 数据集2],
baselines: [基线方法1, 基线方法2],
metrics: [
{name: 指标名称, description: 衡量内容, definition: 指标的数学定义或公式}
],
results: [
{setting: 数据集, metric: 指标, proposedmethod: 得分, bestbaseline: 得分}
],
key_findings: [发现1, 发现2]
}
}
脚本
| 脚本 | 功能 |
|---|
| searchpapers.py | 搜索论文(Tavily + Semantic Scholar) |
| downloadpaper.py |
下载PDF(供人工阅读) |
| latex_reader.py | 下载LaTeX源码(供AI阅读) |
LaTeX阅读技巧
| LaTeX命令 | 含义 |
|---|
| \section{标题} | 章节标题 |
| \subsection{标题} |
子章节标题 |
| \textbf{文本} | 粗体文本(通常重要) |
| \cite{键} | 引用标记 |
| \begin{equation}...\end{equation} | 编号公式 |
| \begin{table}...\end{table} | 表格 |
| \begin{figure}...\end{figure} | 图表 |
| \input{文件}或\subfile{文件} | 包含另一个.tex文件 |
配置
bash
可选:Semantic Scholar API密钥
export SEMANTIC
SCHOLARAPI_KEY=your-key
默认下载路径
C:\Users\Lenovo\Desktop\papers