返回顶部
a

arxiv-paper-processor

Tool-only paper processing skill with a manual language parameter: supports batch artifact download for many papers or single-paper download, then the model manually reads source/PDF and writes summary.md in the selected language. Use when per-paper comprehension should be model-driven instead of script-generated."

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 0.1.1
安全检测
已通过
1,675
下载量
1
收藏
概述
安装方式
版本历史

arxiv-paper-processor

# ArXiv Paper Processor Use this skill for per-paper manual summarization, with optional batch artifact download. - Single-paper mode: process one paper directory (e.g. `<run_dir>/<arxiv_id>/`). - Batch predownload mode: process many paper directories under one run dir before writing summaries. ## Language Parameter - Use a workflow language parameter (for example `English` or `Chinese`) and apply it manually. - The per-paper `summary.md` must be written in the selected language. - If download scripts are called directly, pass `--language <LANG>` for traceability. ## Core Principle Scripts only fetch artifacts. The model performs reading and writing. ## Non-negotiable Constraint - Do not generate `summary.md` by script-based snippet extraction, regex harvesting, or template autofill. - Do not use Python/shell scripts to auto-compose section text from abstract/introduction fragments. - Scripts in this skill are only for artifact download (`source`/`pdf`) and trace logs. - The final `summary.md` must come from model-side reading and synthesis of the paper content. ## Optional Batch Artifact Download (Many Papers) Use this first when Stage B has many papers: ```bash python3 scripts/download_papers_batch.py \ --run-dir /path/to/run \ --artifact source_then_pdf \ --max-workers 3 \ --min-interval-sec 5 \ --language English ``` Key behavior: - Supports `--artifact source`, `--artifact pdf`, or `--artifact source_then_pdf` (default). - Supports concurrency (`--max-workers`) and safe throttling/retry (`--min-interval-sec`, retry args). - Uses run-local throttle state by default (`<run_dir>/.runtime/arxiv_download_state.json`) to reduce 429 risk. - Skips papers that already have usable `source/source_extract/*.tex` or existing `source/paper.pdf` (unless `--force`). - Resume-friendly: if a paper already has a completed `summary.md`, you can skip that paper's summary-writing step. - Writes batch log to `<run_dir>/download_batch_log.json` by default. ## Step 1: Download Source (Preferred) ```bash python3 scripts/download_arxiv_source.py \ --paper-dir /path/to/run/2602.00528 \ --language English ``` This writes: - `source/source_bundle.bin` - `source/source_extract/` - `source/download_source_log.json` If usable source already exists and `--force` is not set, the script reuses local artifacts. ## Step 2: If Needed, Download PDF ```bash python3 scripts/download_arxiv_pdf.py \ --paper-dir /path/to/run/2602.00528 \ --language English ``` This writes: - `source/paper.pdf` - `source/download_pdf_log.json` If PDF already exists and `--force` is not set, the script reuses local artifacts. ## Step 3: Model Reads and Summarizes 1. If `summary.md` already exists and follows the required format, skip this paper and mark it complete. 2. Read `metadata.md` first. 3. If `source/source_extract/` already exists with readable `.tex` files, use it directly. 4. Otherwise, if `source/paper.pdf` already exists, use PDF directly. 5. If neither exists, run download scripts (single-paper scripts or batch script) first. 6. Manually write `summary.md` in the same paper directory, in the selected language. Do not rely on rule-based auto summarization. Do not rely on auto-extracted snippets as the primary writing basis. ## Quality Requirement - Every section should include paper-specific details that are traceable to full-text reading. - Section 4/5/10 should reflect concrete method and evaluation details, not generic wording. - If key details are unclear in the source, explicitly note uncertainty instead of guessing. - Match the detail level shown in `references/summary-example-en.md` and `references/summary-example-zh.md`. - If your draft is clearly shorter or less specific than the examples, expand it before finishing. ## Required Output - `<paper_dir>/summary.md` in fixed section format. - Pay special attention to section `## 10. Brief Conclusion`: write a 3-4 sentence mini-conclusion that covers contribution, method, evaluation setup, and results with paper-specific details. - In section `## 1. Paper Snapshot`, use exact keys: `ArXiv ID`, `Title`, `Authors`, `Publish date`, `Primary category`, `Reading basis`. - Do not use key variants such as `Reading source`, `Author list`, `Published on`, or lowercase key names. See `references/summary-format.md` for exact section requirements. ## Related Skills This skill is a sub-skill of `arxiv-summarizer-orchestrator`. Pipeline position: 1. Step 1 (upstream): `arxiv-search-collector` produces the selected paper directories and metadata. 2. Step 2 (this skill): `arxiv-paper-processor` downloads artifacts and writes one `summary.md` per paper. 3. Step 3 (downstream): `arxiv-batch-reporter` uses these per-paper summaries to generate the final collection report. Use this skill together with Step 1 and Step 3 for full end-to-end execution.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 arxiv-paper-processor-1776419948 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 arxiv-paper-processor-1776419948 技能

通过命令行安装

skillhub install arxiv-paper-processor-1776419948

下载 Zip 包

⬇ 下载 arxiv-paper-processor v0.1.1

文件大小: 23.15 KB | 发布时间: 2026-4-17 19:30

v0.1.1 最新 2026-4-17 19:30
Document cross-skill relationships in all SKILL.md files

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部