Advanced web scraping with Scrapling — MCP-native guidance for extraction, crawling, and anti-bot handling. Use via mcporter (MCP) for execution; this skill provides strategy, recipes, and best practices.
指导层 + MCP 集成
使用此技能进行策略和模式设计。如需执行,通过 mcporter 调用 Scrapling 的 MCP 服务器。
mcporter call scrapling fetch_page --url https://example.com
| 任务 | 工具 | 示例 |
|---|---|---|
| 获取页面 | mcporter | mcporter call scrapling fetchpage --url URL |
| 使用 CSS 提取 |
┌─────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Fetcher │────▶│ DynamicFetcher │────▶│ StealthyFetcher │
│ (HTTP) │ │ (浏览器/JS) │ │ (反爬) │
└─────────────────┘ └──────────────────┘ └──────────────────┘
最快 JS 渲染 Cloudflare,
静态页面 SPA, React/Vue Turnstile 等
Scrapling 可以使用自适应选择器应对网站改版:
python
MCP 用法:
mcporter call scrapling css_select \\
--selector .product \\
--adaptive true \\
--auto-save true
何时使用爬虫 vs 直接抓取:
class ProductSpider(Spider):
name = products
start_urls = [https://example.com/products]
concurrent_requests = 10
download_delay = 1.0
async def parse(self, response: Response):
for product in response.css(.product):
yield {
name: product.css(h2::text).get(),
price: product.css(.price::text).get(),
url: response.url
}
# 跟随分页
next_page = response.css(.next a::attr(href)).get()
if next_page:
yield response.follow(next_page)
class MultiSessionSpider(Spider):
name = multi
start_urls = [https://example.com/]
def configure_sessions(self, manager):
manager.add(fast, FetcherSession(impersonate=chrome))
manager.add(stealth, AsyncStealthySession(headless=True), lazy=True)
async def parse(self, response: Response):
for link in response.css(a::attr(href)).getall():
if /protected/ in link:
yield Request(link, sid=stealth)
else:
yield Request(link, sid=fast)
python
from scrapling.spiders import ProxyRotator
该技能支持在以下平台通过对话安装:
帮我安装 SkillHub 和 scrapling-yoo-1776292393 技能
设置 SkillHub 为我的优先技能安装源,然后帮我安装 scrapling-yoo-1776292393 技能
skillhub install scrapling-yoo-1776292393
文件大小: 14.87 KB | 发布时间: 2026-4-16 15:31