返回顶部
🇺🇸 English
🇨🇳 简体中文
🇨🇳 繁體中文
🇺🇸 English
🇯🇵 日本語
🇰🇷 한국어
🇫🇷 Français
🇩🇪 Deutsch
🇪🇸 Español
🇷🇺 Русский
a

agent-survey-corpus

Download a small corpus of open-access arXiv survey/review PDFs about LLM agents and extract text for style

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
114
下载量
0
收藏
概述
安装方式
版本历史

agent-survey-corpus

# Agent Survey Corpus (arXiv PDFs → text extracts) Goal: create a small, local reference library so you can **learn from real agent surveys** when refining: - C2 outline structure (paper-like sectioning) - C4 tables/claims organization - C5 writing style and density This is intentionally *not* part of the pipeline; it is an optional, repo-level toolkit. ## Inputs - `ref/agent-surveys/arxiv_ids.txt` ## Outputs - `ref/agent-surveys/pdfs/` - `ref/agent-surveys/text/` - `ref/agent-surveys/STYLE_REPORT.md` (tracked; auto-generated summary) ## Workflow 1) Edit `ref/agent-surveys/arxiv_ids.txt` (one arXiv id per line). 2) Run the downloader to fetch PDFs and extract the first N pages to text. 3) Skim the extracted text under `ref/agent-surveys/text/`: - look at section counts (H2), subsection granularity (H3), and how they transition between chapters. - identify repeated rhetorical patterns you want the pipeline writer to imitate. ## Script ### Quick Start - `python scripts/run.py --help` - `python scripts/run.py --workspace . --max-pages 20` ### All Options - `--workspace <dir>` (use `.` to write into repo root) - `--inputs <semicolon-separated>` (default: `ref/agent-surveys/arxiv_ids.txt`) - `--max-pages <N>` (default: 20) - `--sleep <seconds>` (default: 1.0) - `--overwrite` (re-download + re-extract) ### Examples - Download/extract into repo root `ref/`: - `python scripts/run.py --workspace . --max-pages 20` - Download/extract into a specific folder (treated as workspace root): - `python scripts/run.py --workspace /tmp/surveys --max-pages 30` ## Troubleshooting - **Download fails / timeout**: rerun with a larger `--sleep`, or try fewer ids. - **Text extract is empty**: the PDF may be scanned; try another survey or increase `--max-pages`. - **Files showing up in git status**: PDFs/text are ignored via `.gitignore` (`ref/**/pdfs/`, `ref/**/text/`).

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 agent-survey-corpus-1776113954 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 agent-survey-corpus-1776113954 技能

通过命令行安装

skillhub install agent-survey-corpus-1776113954

下载 Zip 包

⬇ 下载 agent-survey-corpus v1.0.0

文件大小: 118.92 KB | 发布时间: 2026-4-17 13:57

v1.0.0 最新 2026-4-17 13:57
- Initial release of agent-survey-corpus skill for downloading and extracting text from arXiv survey/review PDFs about LLM agents.
- Provides a toolkit to build a local reference library for analyzing real survey structures and writing styles.
- Supports customizable workspace, page limits, and safe download (arXiv-only) with guardrails to keep large files outside git.
- Includes clear workflow and CLI script for managing PDFs and extracting text for study and style learning.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部