XCrawl Scrape

Overview

This skill handles single-page extraction with XCrawl Scrape APIs.
Default behavior is raw passthrough: return upstream API response bodies as-is.

Required Local Config

Before using this skill, the user must create a local config file and write XCRAWL_API_KEY into it.

Path: INLINECODE1

CODEBLOCK0

Read API key from local config file only. Do not require global environment variables.

Credits and Account Setup

Using XCrawl APIs consumes credits.
If the user does not have an account or available credits, guide them to register at https://dash.xcrawl.com/.
After registration, they can activate the free 1000 credits plan before running requests.

Tool Permission Policy

Request runtime permissions for curl and node only.
Do not request Python, shell helper scripts, or other runtime permissions.

API Surface

- Start scrape: INLINECODE6
Read async result: INLINECODE7
Base URL: INLINECODE8
Required header: INLINECODE9

Usage Examples

cURL (sync)

CODEBLOCK1

cURL (async create + result)

CODEBLOCK2

Node

CODEBLOCK3

Request Parameters

Request endpoint and headers

- Endpoint: INLINECODE10
Headers:
INLINECODE11
INLINECODE12

Request body: top-level fields

Field	Type	Required	Default	Description
INLINECODE13	string	Yes	-	Target URL
INLINECODE14

`proxy`

Field	Type	Required	Default	Description
INLINECODE25	string	No	INLINECODE26	ISO-3166-1 alpha-2 country code, e.g. `US` / `JP` / INLINECODE29
INLINECODE30

string | No | Auto-generated | Sticky session ID; same ID attempts to reuse exit |

`request`

Field	Type	Required	Default	Description
INLINECODE32	string	No	INLINECODE33	Affects INLINECODE34
INLINECODE35

`js_render`

Field	Type	Required	Default	Description
INLINECODE48	boolean	No	INLINECODE49	Enable browser rendering
INLINECODE50

`output`

Field	Type	Required	Default	Description
INLINECODE62	string[]	No	INLINECODE63	Output formats
INLINECODE64

INLINECODE72 enum:

- INLINECODE73
INLINECODE74
INLINECODE75
INLINECODE76
INLINECODE77
INLINECODE78
INLINECODE79

`webhook`

Field	Type	Required	Default	Description
INLINECODE81	string	No	-	Callback URL
INLINECODE82

Response Parameters

Sync create response (`mode=sync`)

Field	Type	Description
INLINECODE89	string	Task ID
INLINECODE90

INLINECODE101 fields (based on output.formats):

- html, raw_html, markdown, links, summary, screenshot, INLINECODE109
INLINECODE110 (page metadata)
INLINECODE111
INLINECODE112
INLINECODE113

INLINECODE114 fields:

Field	Type	Description
INLINECODE115	integer	Base scrape cost
INLINECODE116

Async create response (`mode=async`)

Field	Type	Description
INLINECODE119	string	Task ID
INLINECODE120

Async result response (`GET /v1/scrape/{scrape_id}`)

Field	Type	Description
INLINECODE126	string	Task ID
INLINECODE127

Workflow

1. Restate the user goal as an extraction contract.

- URL scope, required fields, accepted nulls, and precision expectations.

2. Build the scrape request body.

- Keep only necessary options.
Prefer explicit output.formats.

3. Execute scrape and capture task metadata.

- Track scrape_id, status, and timestamps.
If async, poll until completed or failed.

4. Return raw API responses directly.

- Do not synthesize or compress fields by default.

Output Contract

Return:

- Endpoint(s) used and mode (sync or async)
INLINECODE147 used for the request
Raw response body from each API call
Error details when request fails

Do not generate summaries unless the user explicitly requests a summary.

Guardrails

- Do not invent unsupported output fields.
Do not hardcode provider-specific tool schemas in core logic.
Call out uncertainty when page structure is unstable.

XCrawl Scrape

概述

该技能使用 XCrawl Scrape API 处理单页提取。
默认行为为原始透传：按原样返回上游 API 响应体。

所需本地配置

使用此技能前，用户必须创建本地配置文件并写入 XCRAWLAPIKEY。

路径：~/.xcrawl/config.json

json
{
XCRAWLAPIKEY: apikey>
}

仅从本地配置文件读取 API 密钥。无需全局环境变量。

积分与账户设置

使用 XCrawl API 会消耗积分。
如果用户没有账户或可用积分，引导其前往 https://dash.xcrawl.com/ 注册。
注册后，可在运行请求前激活免费的 1000 积分套餐。

工具权限策略

仅请求 curl 和 node 的运行时权限。
不请求 Python、Shell 辅助脚本或其他运行时权限。

API 接口

- 开始抓取：POST /v1/scrape
读取异步结果：GET /v1/scrape/{scrapeid}
基础 URL：https://run.xcrawl.com
必需请求头：Authorization: Bearer API_KEY>

使用示例

cURL（同步）

bash
APIKEY=$(node -e const fs=require(fs);const p=process.env.HOME+/.xcrawl/config.json;const k=JSON.parse(fs.readFileSync(p,utf8)).XCRAWLAPI_KEY||;process.stdout.write(k))

curl -sS -X POST https://run.xcrawl.com/v1/scrape \
-H Content-Type: application/json \
-H Authorization: Bearer ${API_KEY} \
-d {url:https://example.com,mode:sync,output:{formats:[markdown,links]}}

cURL（异步创建 + 结果）

bash
APIKEY=$(node -e const fs=require(fs);const p=process.env.HOME+/.xcrawl/config.json;const k=JSON.parse(fs.readFileSync(p,utf8)).XCRAWLAPI_KEY||;process.stdout.write(k))

CREATE_RESP=$(curl -sS -X POST https://run.xcrawl.com/v1/scrape \
-H Content-Type: application/json \
-H Authorization: Bearer ${API_KEY} \
-d {url:https://example.com/product/1,mode:async,output:{formats:[json]},json:{prompt:提取标题和价格。}})

echo $CREATE_RESP

SCRAPEID=$(node -e const s=process.argv[1];const j=JSON.parse(s);process.stdout.write(j.scrapeid||) $CREATE_RESP)

curl -sS -X GET https://run.xcrawl.com/v1/scrape/${SCRAPE_ID} \
-H Authorization: Bearer ${API_KEY}

Node

bash
node -e
const fs=require(fs);
const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+/.xcrawl/config.json,utf8)).XCRAWLAPIKEY;
const body={url:https://example.com,mode:sync,output:{formats:[markdown,json]},json:{prompt:提取标题和发布日期。}};
fetch(https://run.xcrawl.com/v1/scrape,{
method:POST,
headers:{Content-Type:application/json,Authorization:Bearer ${apiKey}},
body:JSON.stringify(body)
}).then(async r=>{console.log(await r.text());});

请求参数

请求端点与请求头

- 端点：POST https://run.xcrawl.com/v1/scrape
请求头：
Content-Type: application/json
Authorization: Bearer

请求体：顶层字段

字段类型必需默认值描述
url 字符串是 - 目标 URL
mode
字符串 | 否 | sync | sync 或 async | | proxy | 对象 | 否 | - | 代理配置 | | request | 对象 | 否 | - | 请求配置 | | js_render | 对象 | 否 | - | JS 渲染配置 | | output | 对象 | 否 | - | 输出配置 | | webhook | 对象 | 否 | - | 异步 Webhook 配置（mode=async） |
proxy

字段类型必需默认值描述
location 字符串否 US ISO-3166-1 alpha-2 国家代码，例如 US / JP / SG
sticky_session
字符串 | 否 | 自动生成 | 粘性会话 ID；相同 ID 会尝试复用出口 |
request

字段类型必需默认值描述
locale 字符串否 en-US,en;q=0.9 影响 Accept-Language
device
字符串 | 否 | desktop | desktop / mobile；影响 UA 和视口 | | cookies | 对象映射 | 否 | - | Cookie 键值对 | | headers | 对象映射 | 否 | - | 请求头键值对 | | onlymaincontent | 布尔值 | 否 | true | 仅返回主要内容 | | block_ads | 布尔值 | 否 | true | 尝试屏蔽广告资源 | | skiptlsverification | 布尔值 | 否 | true | 跳过 TLS 验证 |
js_render

字段类型必需默认值描述
enabled 布尔值否 true 启用浏览器渲染
wait_until
字符串 | 否 | load | load / domcontentloaded / networkidle | | viewport.width | 整数 | 否 | - | 视口宽度（桌面端 1920，移动端 402） | | viewport.height | 整数 | 否 | - | 视口高度（桌面端 1080，移动端 874） |
output

字段类型必需默认值描述
formats 字符串数组否 [markdown] 输出格式
screenshot
字符串 | 否 | viewport | full_page / viewport（仅当 formats 包含 screenshot 时） | | json.prompt | 字符串 | 否 | - | 提取提示 | | json.json_schema | 对象 | 否 | - | JSON Schema |
output.formats 枚举：

- html
raw_html
markdown
links
summary
screenshot
json

webhook

字段类型必需默认值描述
url 字符串否 - 回调 URL
headers
对象映射 | 否 | - | 自定义回调请求头 | | events | 字符串数组 | 否 | [started,completed,failed] | 事件：started / completed / failed |
响应参数

同步创建响应（mode=sync）

字段类型描述
scrape_id 字符串任务 ID
endpoint
字符串 | 始终为 scrape | | version | 字符串 | 版本 | | status | 字符串 | completed / failed | | url | 字符串 | 目标 URL | | data | 对象 | 结果数据 | | started_at | 字符串 | 开始时间（ISO 8601） | | ended_at | 字符串 | 结束时间（ISO 8601） | | totalcreditsused | 整数 | 使用的总积分 |
data 字段（基于 output.formats）：

- html、raw_html、markdown、links、summary、screenshot、json
metadata（页面元数据）
-

字段	类型	必需	默认值	描述
location	字符串	否	US	ISO-3166-1 alpha-2 国家代码，例如 US / JP / SG
sticky_session

字段	类型	必需	默认值	描述
locale	字符串	否	en-US,en;q=0.9	影响 Accept-Language
device

xcrawl-scrapeXCrawl抓取

xcrawl-scrape

XCrawl Scrape

Overview

Required Local Config

Credits and Account Setup

Tool Permission Policy

API Surface

Usage Examples

cURL (sync)

cURL (async create + result)

Node

Request Parameters

Request endpoint and headers

Request body: top-level fields

proxy

request

js_render

output

webhook

Response Parameters

Sync create response (mode=sync)

Async create response (mode=async)

Async result response (GET /v1/scrape/{scrape_id})

Workflow

Output Contract

Guardrails

XCrawl Scrape

概述

所需本地配置

积分与账户设置

工具权限策略

API 接口

使用示例

cURL（同步）

cURL（异步创建 + 结果）

Node

请求参数

请求端点与请求头

请求体：顶层字段

proxy

request

js_render

output

webhook

响应参数

同步创建响应（mode=sync）

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement

`proxy`

`request`

`js_render`

`output`

`webhook`

Sync create response (`mode=sync`)

Async create response (`mode=async`)

Async result response (`GET /v1/scrape/{scrape_id}`)