Web Fetcher

Smart web content fetcher for Claude Code. Automatically detects platform and uses the best strategy to fetch articles or download videos.

Quick Start

CODEBLOCK0

Install Dependencies

Install only what you need — dependencies are checked at runtime:

Dependency	Purpose	Install
scrapling	Article fetching (HTTP + browser)	INLINECODE0
yt-dlp

Smart Routing

The fetcher automatically detects the platform from the URL:

Platform	Method	Notes
mp.weixin.qq.com	scrapling	Extracts `data-src` images, handles SVG placeholders
*.feishu.cn

CLI Reference

CODEBLOCK1

Platform Notes

WeChat (mp.weixin.qq.com)

- Images use data-src attribute with mmbiz.qpic.cn URLs
Visible <img> tags contain SVG placeholders (lazy loading)
Image download requires Referer: https://mp.weixin.qq.com/ header
Scrapling GET usually works; no browser needed

Feishu (*.feishu.cn)

- Uses virtual scroll — content blocks are rendered on-demand
The fetcher scrolls through the entire document, collecting [data-block-id] elements
Images require authenticated fetch (cookies), downloaded via browser's fetch API
May show "Unable to print" artifacts which are auto-cleaned

Bilibili

- Short links (b23.tv) are auto-resolved
For premium/member content, use INLINECODE13
Default quality is 1080p, adjustable with INLINECODE14

Troubleshooting

Problem	Solution
INLINECODE15	INLINECODE16
INLINECODE17

Manual Usage

When the CLI doesn't fit your needs, use the modules directly:

CODEBLOCK2

技能名称: web-fetcher

详细描述:

Web 抓取器

适用于 Claude Code 的智能网页内容抓取工具。自动检测平台并使用最佳策略抓取文章或下载视频。

快速开始

bash

抓取文章

python3 {SKILL_DIR}/fetcher.py URL -o ~/docs/

下载视频

python3 {SKILL_DIR}/fetcher.py https://b23.tv/xxx -o ~/videos/

从文件批量抓取

python3 {SKILL_DIR}/fetcher.py --urls-file urls.txt -o ~/docs/

安装依赖

仅安装所需内容——依赖项在运行时检查：

依赖项	用途	安装命令
scrapling	文章抓取（HTTP + 浏览器）	pip install scrapling
yt-dlp

智能路由

抓取工具自动根据 URL 检测平台：

平台	方法	备注
mp.weixin.qq.com	scrapling	提取 data-src 图片，处理 SVG 占位符
*.feishu.cn

CLI 参考

python3 {SKILL_DIR}/fetcher.py [URL] [选项]

参数：
url 要抓取的 URL

选项：
-o, --output 目录输出目录（默认：当前目录）
-q, --quality 画质视频画质，例如 1080、720（默认：1080）
--method 方法强制指定方法：scrapling、camoufox、ytdlp、feishu
--selector CSS 选择器强制指定内容提取的 CSS 选择器
--urls-file 文件包含 URL 的文件（每行一个，# 表示注释）
--audio-only 仅提取音频（视频下载）
--no-images 跳过图片下载（文章）
--cookies-browser 浏览器用于 cookies 的浏览器（例如 chrome、firefox）

平台说明

微信（mp.weixin.qq.com）

- 图片使用 data-src 属性，URL 为 mmbiz.qpic.cn
可见的标签包含 SVG 占位符（懒加载）
图片下载需要 Referer: https://mp.weixin.qq.com/ 请求头
Scrapling 的 GET 请求通常有效；无需浏览器

飞书（*.feishu.cn）

- 使用虚拟滚动——内容块按需渲染
抓取工具滚动整个文档，收集 [data-block-id] 元素
图片需要经过身份验证的抓取（cookies），通过浏览器的 fetch API 下载
可能显示无法打印的残留内容，会自动清理

Bilibili

- 短链接（b23.tv）会自动解析
对于付费/会员内容，使用 --cookies-browser chrome
默认画质为 1080p，可通过 -q 调整

故障排除

问题	解决方案
scrapling not found	pip install scrapling
yt-dlp not found

手动使用

当 CLI 不满足需求时，可直接使用模块：

python
from lib.router import route, check_dependency
from lib.article import fetch_article
from lib.video import fetch_video
from lib.feishu import fetch_feishu

路由 URL

r = route(https://mp.weixin.qq.com/s/xxx)

{type: article, method: scrapling, selector: #jscontent, post: wximages}

抓取文章

fetcharticle(url, outputdir=/tmp/out, route_config=r)

下载视频

fetchvideo(url, outputdir=/tmp/out, quality=720)

抓取飞书文档

fetchfeishu(url, outputdir=/tmp/out)

web-fetcher智能网页抓取

web-fetcher

Web Fetcher

Quick Start

Install Dependencies

Smart Routing

CLI Reference