Scrapling 技能

使用 scrapling 命令行工具，通过自适应解析和反爬绕过技术来抓取网站。

使用时机

✅ 在以下情况使用此技能：

- 抓取静态或动态网站
绕过 Cloudflare、验证码或机器人检测
从网页中提取结构化数据（HTML/JSON）
处理 JavaScript 渲染的内容
获取不含多余脚本/CSS 的干净 HTML

不宜使用的情况

❌ 在以下情况不要使用此技能：

- 简单的 HTTP 请求 → 使用 web_fetch
需要完整的浏览器自动化 → 使用 browser 工具
基于 API 的数据 → 使用直接 API 调用
本地文件处理 → 使用文件工具

安装配置

bash

安装 CLI

pipx install scrapling
scrapling --version

常用命令

基础抓取

bash

获取干净的 HTML

scrapling https://example.com -o html

获取 JSON 结构

scrapling https://example.com -o json

保存到文件

scrapling https://example.com -o output.html

设置请求头/超时

bash

自定义请求头

scrapling https://example.com --headers User-Agent: Mozilla/5.0

超时时间（秒）

scrapling https://slow-site.com --timeout 30

提取特定元素

bash

XPath 提取

scrapling https://example.com -e //div[@class=content] -o html

CSS 选择器

scrapling https://example.com -e div.content -o html

带字段的 JSON 输出

bash

提取标题、元描述

scrapling https://example.com \
--fields title,meta_description \
-o json

MCP 集成

Scrapling 支持用于 AI 代理的 MCP（模型上下文协议）：

bash

启动 MCP 服务器

scrapling mcp start

然后配置你的代理通过 MCP 使用 scrape 工具。

示例

抓取新闻文章

bash
scrapling https://example.com/news/article-123 \
--fields title,author,publish_date,content \
-o json

提取产品数据

bash
scrapling https://shop.example.com/products \
-e //div[@class=product] \
-o html

处理 Cloudflare

bash

Scrapling 自动绕过大多数保护措施

scrapling https://protected-site.com -o html

注意事项

- 默认超时：10 秒
自动检测最佳输出格式（html/json/text）
必要时通过无头浏览器处理动态内容
对速率限制友好；在请求之间添加延迟

JSON 输出格式

json
{
title: 页面标题,
meta_description: 描述文本,
content: <干净的 HTML>,
links: [http://..., ...],
images: [{src: ..., alt: ...}]
}

scrapling自适应网页抓取

scrapling

Scrapling Skill

When to Use

When NOT to Use

Setup

Common Commands

Basic Scrape

With Headers/Timeouts

Extract Specific Elements

JSON Output with Fields

MCP Integration

Examples

Scrape News Article

Extract Product Data

Handle Cloudflare

Notes

JSON Output Format

When to Use

When NOT to Use

Setup

Common Commands

Basic Scrape

With Headers/Timeouts

Extract Specific Elements

JSON Output with Fields

MCP Integration

Examples

Scrape News Article

Extract Product Data

Handle Cloudflare

Notes

JSON Output Format

Scrapling 技能

使用时机

不宜使用的情况

安装配置

安装 CLI

常用命令

基础抓取

获取干净的 HTML

获取 JSON 结构

保存到文件

设置请求头/超时

自定义请求头

超时时间（秒）

提取特定元素

XPath 提取

CSS 选择器

带字段的 JSON 输出

提取标题、元描述

MCP 集成

启动 MCP 服务器

示例

抓取新闻文章

提取产品数据

处理 Cloudflare

Scrapling 自动绕过大多数保护措施

注意事项

JSON 输出格式

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement