Extract Skill

Extract clean content from specific URLs. Ideal when you know which pages you want content from.

Authentication

The script uses OAuth via the Tavily MCP server. No manual setup required - on first run, it will:

1. Check for existing tokens in INLINECODE0
If none found, automatically open your browser for OAuth authentication

Note: You must have an existing Tavily account. The OAuth flow only supports login — account creation is not available through this flow. Sign up at tavily.com first if you don't have an account.

Alternative: API Key

If you prefer using an API key, get one at https://tavily.com and add to ~/.claude/settings.json:
CODEBLOCK0

Quick Start

Using the Script

CODEBLOCK1

Examples:
CODEBLOCK2

Basic Extraction

CODEBLOCK3

Multiple URLs with Query Focus

CODEBLOCK4

API Reference

Endpoint

CODEBLOCK5

Headers

Header	Value
INLINECODE2	INLINECODE3
INLINECODE4

application/json |

Request Body

Field	Type	Default	Description
INLINECODE6	array	Required	URLs to extract (max 20)
INLINECODE7

Response Format

CODEBLOCK6

Extract Depth

Depth	When to Use
INLINECODE19	Simple text extraction, faster
INLINECODE20

Dynamic/JS-rendered pages, tables, structured data |

Examples

Single URL Extraction

CODEBLOCK7

Targeted Extraction with Query

CODEBLOCK8

JavaScript-Heavy Pages

CODEBLOCK9

Batch Extraction

CODEBLOCK10

Tips

- Max 20 URLs per request - batch larger lists
Use query + chunks_per_source to get only relevant content
Try basic first, fall back to advanced if content is missing
Set longer timeout for slow pages (up to 60s)
Check failed_results for URLs that couldn't be extracted

提取技能

从特定URL中提取干净内容。当您明确知道需要从哪些页面获取内容时，此功能最为理想。

身份验证

该脚本通过Tavily MCP服务器使用OAuth进行身份验证。无需手动设置——首次运行时，它将：

1. 检查~/.mcp-auth/中是否存在现有令牌
如果未找到，将自动打开浏览器进行OAuth身份验证

注意： 您必须拥有现有的Tavily账户。OAuth流程仅支持登录——无法通过此流程创建账户。如果您没有账户，请先在tavily.com注册。

替代方案：API密钥

如果您更倾向于使用API密钥，请前往https://tavily.com获取密钥，并将其添加到~/.claude/settings.json：
json
{
env: {
TAVILYAPIKEY: tvly-在此处输入您的API密钥
}
}

快速开始

使用脚本

bash
./scripts/extract.sh

示例：
bash

单个URL

./scripts/extract.sh {urls: [https://example.com/article]}

多个URL

./scripts/extract.sh {urls: [https://example.com/page1, https://example.com/page2]}

带查询焦点和分块

./scripts/extract.sh {urls: [https://example.com/docs], query: authentication API, chunkspersource: 3}

针对JS页面的高级提取

./scripts/extract.sh {urls: [https://app.example.com], extract_depth: advanced, timeout: 60}

基础提取

带查询焦点的多个URL

bash
curl --request POST \
--url https://api.tavily.com/extract \
--header Authorization: Bearer $TAVILYAPIKEY \
--header Content-Type: application/json \
--data {
urls: [
https://example.com/ml-healthcare,
https://example.com/ai-diagnostics
],
query: AI diagnostic tools accuracy,
chunkspersource: 3
}

API参考

端点

POST https://api.tavily.com/extract

请求头

请求头	值
Authorization	Bearer <TAVILYAPIKEY>
Content-Type

application/json |

请求体

字段	类型	默认值	描述
urls	数组	必填	要提取的URL（最多20个）
query

响应格式

json
{
results: [
{
url: https://example.com/article,
raw_content: # 文章标题\n\n内容...
}
],
failed_results: [],
response_time: 2.3
}

提取深度

深度	使用场景
basic	简单文本提取，速度更快
advanced

动态/JS渲染页面、表格、结构化数据 |

示例

单个URL提取

bash
curl --request POST \
--url https://api.tavily.com/extract \
--header Authorization: Bearer $TAVILYAPIKEY \
--header Content-Type: application/json \
--data {
urls: [https://docs.python.org/3/tutorial/classes.html],
extract_depth: basic
}

带查询的定向提取

bash
curl --request POST \
--url https://api.tavily.com/extract \
--header Authorization: Bearer $TAVILYAPIKEY \
--header Content-Type: application/json \
--data {
urls: [
https://example.com/react-hooks,
https://example.com/react-state
],
query: useState and useEffect patterns,
chunkspersource: 2
}

JavaScript密集型页面

bash
curl --request POST \
--url https://api.tavily.com/extract \
--header Authorization: Bearer $TAVILYAPIKEY \
--header Content-Type: application/json \
--data {
urls: [https://app.example.com/dashboard],
extract_depth: advanced,
timeout: 60
}

批量提取

bash
curl --request POST \
--url https://api.tavily.com/extract \
--header Authorization: Bearer $TAVILYAPIKEY \
--header Content-Type: application/json \
--data {
urls: [
https://example.com/page1,
https://example.com/page2,
https://example.com/page3,
https://example.com/page4,
https://example.com/page5
],
extract_depth: basic
}

提示

- 每次请求最多20个URL - 对更大的列表进行分批处理
使用query + chunkspersource 仅获取相关内容
先尝试basic，如果内容缺失则回退到advanced
为慢速页面设置更长的timeout（最长60秒）
检查failed_results 查看无法提取的URL

extractURL内容提取

extract

Extract Skill

Authentication

Alternative: API Key

Quick Start

Using the Script

Basic Extraction

Multiple URLs with Query Focus

API Reference

Endpoint

Headers

Request Body

Response Format

Extract Depth

Examples

Single URL Extraction

Targeted Extraction with Query

JavaScript-Heavy Pages

Batch Extraction

Tips

提取技能

身份验证

替代方案：API密钥

快速开始

使用脚本

单个URL

多个URL

带查询焦点和分块

针对JS页面的高级提取

基础提取

带查询焦点的多个URL

API参考

端点

请求头

请求体

响应格式

提取深度

示例

单个URL提取

带查询的定向提取

JavaScript密集型页面

批量提取

提示

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement