When to Use
- - User provides a URL and wants to extract/read its content
- Another skill needs to parse source material from a URL before generation
- User says "parse this URL", "extract content from this link"
- User says "解析链接", "提取内容"
When NOT to Use
- - User already has text content and doesn't need URL parsing
- User wants to generate audio/video content (not content extraction)
- User wants to read a local file (use standard file reading tools)
Purpose
Extract and normalize content from URLs across supported platforms. Returns structured data including content body, metadata, and references. Useful as a preprocessing step for content generation skills or standalone content extraction.
Hard Constraints
- - No shell scripts. Construct curl commands from the API reference files listed in Resources
- Always read
shared/authentication.md for API key and headers - Follow
shared/common-patterns.md for polling, errors, and interaction patterns - URL must be a valid HTTP(S) URL
- Always read config following
shared/config-pattern.md before any interaction - Never save files to
~/Downloads/ or .listenhub/ — save to the current working directory
Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After collecting URL and options, confirm with the user before calling the extraction API.
Step -1: API Key Check
Follow shared/config-pattern.md § API Key Check. If the key is missing, stop immediately.
Step 0: Config Setup
Follow shared/config-pattern.md Step 0.
If file doesn't exist — ask location, then create immediately:
mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"
# (or $HOME/.listenhub/content-parser/config.json for global)
Then run
Setup Flow below.
If file exists — read config, display summary, and confirm:
当前配置 (content-parser):
自动下载:{是 / 否}
Ask: "使用已保存的配置?" →
确认,直接继续 /
重新配置
Setup Flow (first run or reconfigure)
- 1. autoDownload: "自动保存提取的内容到当前目录?"
- "是(推荐)" →
autoDownload: true
- "否" → INLINECODE8
Save immediately:
CODEBLOCK2
Interaction Flow
Step 1: URL Input
Free text input. Ask the user:
What URL would you like to extract content from?
Step 2: Options (optional)
Ask if the user wants to configure extraction options:
CODEBLOCK3
If "Yes", ask follow-up questions:
- - Summarize: "Generate a summary of the content?" (Yes/No)
- Max Length: "Set maximum content length?" (Free text, e.g., "5000")
- Twitter count (only if URL is Twitter/X profile): "How many tweets to fetch?" (1-100, default 20)
Step 3: Confirm & Extract
Summarize:
CODEBLOCK4
Wait for explicit confirmation before calling the API.
Workflow
- 1. Validate URL: Must be HTTP(S). Normalize if needed (see
references/supported-platforms.md) - Build request body:
{
"source": {
"type": "url",
"uri": "{url}"
},
"options": {
"summarize": true/false,
"maxLength": 5000,
"twitter": {
"count": 50
}
}
}
Omit
options if user chose defaults.
- 3. Submit (foreground):
POST /v1/content/extract → extract INLINECODE12 - Tell the user extraction is in progress
- Poll (background): Run the following exact bash command with
run_in_background: true and timeout: 300000. Note: status field is .data.status (not processStatus), interval is 5s, values are processing/completed/failed:
CODEBLOCK6
- 6. When notified, download and present result:
If autoDownload is true:
- Write {taskId}-extracted.md to the current directory — full extracted content in markdown
- Write {taskId}-extracted.json to the current directory — full raw API response data
CODEBLOCK7
Present:
CODEBLOCK8
- 7. Show a preview of the extracted content (first ~500 chars)
- Offer to use content in another skill (e.g.
/podcast, /tts)
Estimated time: 10-30 seconds depending on content size and platform.
API Reference
- - Content extract: INLINECODE26
- Supported platforms: INLINECODE27
- Polling:
shared/common-patterns.md § Async Polling - Error handling:
shared/common-patterns.md § Error Handling - Config pattern: INLINECODE30
Example
User: "Parse this article: https://en.wikipedia.org/wiki/Topology"
Agent workflow:
- 1. URL: INLINECODE31
- Options: defaults (omit options)
- Submit extraction
CODEBLOCK9
- 4. Poll until complete:
CODEBLOCK10
- 5. Present extracted content preview and offer next actions.
User: "Extract recent tweets from @elonmusk, get 50 tweets"
Agent workflow:
- 1. URL: INLINECODE32
- Options: INLINECODE33
- Submit extraction
CODEBLOCK11
- 4. Poll until complete, present results.
何时使用
- - 用户提供URL并希望提取/读取其内容
- 另一个技能在生成前需要从URL解析源材料
- 用户说解析这个URL、从该链接提取内容
- 用户说解析链接、提取内容
何时不使用
- - 用户已有文本内容,无需URL解析
- 用户希望生成音频/视频内容(非内容提取)
- 用户希望读取本地文件(使用标准文件读取工具)
目的
从支持的平台URL中提取并规范化内容。返回结构化数据,包括内容正文、元数据和引用。可作为内容生成技能的预处理步骤或独立的内容提取工具。
硬性约束
- - 不使用shell脚本。根据资源中列出的API参考文件构建curl命令
- 始终读取shared/authentication.md获取API密钥和请求头
- 遵循shared/common-patterns.md中的轮询、错误处理和交互模式
- URL必须是有效的HTTP(S) URL
- 在任何交互前始终按照shared/config-pattern.md读取配置
- 切勿将文件保存到~/Downloads/或.listenhub/——保存到当前工作目录
对于每个多选步骤,使用AskUserQuestion工具——不要以纯文本形式打印选项。一次只问一个问题。等待用户回答后再进入下一步。收集URL和选项后,在调用提取API前与用户确认。
步骤 -1:API密钥检查
按照shared/config-pattern.md § API密钥检查执行。如果密钥缺失,立即停止。
步骤 0:配置设置
按照shared/config-pattern.md步骤0执行。
如果文件不存在——询问位置,然后立即创建:
bash
mkdir -p .listenhub/content-parser
echo {autoDownload:true} > .listenhub/content-parser/config.json
CONFIG_PATH=.listenhub/content-parser/config.json
(或使用$HOME/.listenhub/content-parser/config.json作为全局配置)
然后运行下面的设置流程。
如果文件存在——读取配置,显示摘要并确认:
当前配置 (content-parser):
自动下载:{是 / 否}
询问:使用已保存的配置? → 确认,直接继续 / 重新配置
设置流程(首次运行或重新配置)
- 1. autoDownload:自动保存提取的内容到当前目录?
- 是(推荐) → autoDownload: true
- 否 → autoDownload: false
立即保存:
bash
NEW_CONFIG=$(echo $CONFIG | jq --argjson dl {true/false} . + {autoDownload: $dl})
echo $NEWCONFIG > $CONFIGPATH
CONFIG=$(cat $CONFIG_PATH)
交互流程
步骤 1:URL输入
自由文本输入。询问用户:
您想从哪个URL提取内容?
步骤 2:选项(可选)
询问用户是否要配置提取选项:
问题:您想配置提取选项吗?
选项:
- 不,使用默认设置 — 使用默认设置提取
- 是,配置选项 — 设置摘要、最大长度或Twitter推文数量
如果选择是,询问后续问题:
- - 摘要:生成内容摘要?(是/否)
- 最大长度:设置最大内容长度?(自由文本,例如5000)
- Twitter数量(仅当URL是Twitter/X个人资料时):获取多少条推文?(1-100,默认20)
步骤 3:确认并提取
摘要:
准备提取内容:
URL:{url}
选项:{summarize: true, maxLength: 5000, twitter.count: 50} / 默认
继续吗?
在调用API前等待明确确认。
工作流程
- 1. 验证URL:必须是HTTP(S)。必要时进行规范化(参见references/supported-platforms.md)
- 构建请求体:
json
{
source: {
type: url,
uri: {url}
},
options: {
summarize: true/false,
maxLength: 5000,
twitter: {
count: 50
}
}
}
如果用户选择默认设置,则省略options。
- 3. 提交(前台):POST /v1/content/extract → 提取taskId
- 告知用户提取正在进行中
- 轮询(后台):使用runinbackground: true和timeout: 300000运行以下精确的bash命令。注意:状态字段是.data.status(不是processStatus),间隔为5秒,值为processing/completed/failed:
bash
TASK_ID=<步骤3中的ID>
for i in $(seq 1 60); do
RESULT=$(curl -sS https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID \
-H Authorization: Bearer $LISTENHUBAPIKEY 2>/dev/null)
STATUS=$(echo $RESULT | tr -d \000-\037\177 | jq -r .data.status // processing)
case $STATUS in
completed) echo $RESULT; exit 0 ;;
failed) echo 失败:$RESULT >&2; exit 1 ;;
*) sleep 5 ;;
esac
done
echo 超时 >&2; exit 2
- 6. 收到通知后,下载并展示结果:
如果autoDownload为true:
- 将{taskId}-extracted.md写入当前目录——完整的提取内容(markdown格式)
- 将{taskId}-extracted.json写入当前目录——完整的原始API响应数据
bash
echo $CONTENTMD > ${TASKID}-extracted.md
echo $RESULT > ${TASK_ID}-extracted.json
展示:
内容提取完成!
来源:{url}
标题:{metadata.title}
长度:~{字符数} 字符
消耗积分:{credits}
已保存到当前目录:
{taskId}-extracted.md
{taskId}-extracted.json
- 7. 显示提取内容的预览(前约500个字符)
- 提供在其他技能中使用内容的选项(例如/podcast、/tts)
预计时间:10-30秒,取决于内容大小和平台。
API参考
- - 内容提取:shared/api-content-extract.md
- 支持的平台:references/supported-platforms.md
- 轮询:shared/common-patterns.md § 异步轮询
- 错误处理:shared/common-patterns.md § 错误处理
- 配置模式:shared/config-pattern.md
示例
用户:解析这篇文章:https://en.wikipedia.org/wiki/Topology
Agent工作流程:
- 1. URL:https://en.wikipedia.org/wiki/Topology
- 选项:默认(省略options)
- 提交提取
bash
curl -sS -X POST https://api.marswave.ai/openapi/v1/content/extract \
-H Authorization: Bearer $LISTENHUBAPIKEY \
-H Content-Type: application/json \
-d {
source: {
type: url,
uri: https://en.wikipedia.org/wiki/Topology
}
}
- 4. 轮询直至完成:
bash
curl -sS https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb \
-H Authorization: Bearer $LISTENHUBAPIKEY
- 5. 展示提取内容预览并提供后续操作选项。
用户:提取@elonmusk的最新推文,获取50条推文
Agent工作流程:
- 1. URL:https://x.com/elonmusk
- 选项:{twitter: {count: 50}}
- 提交提取
bash
curl -sS -X POST https://api.marswave.ai/openapi/v1/content/extract \
-H Authorization: Bearer $LISTENHUBAPIKEY \
-H Content-Type: application/json \
-d {
source: {
type: url,
uri: https://x.com/elonmusk
},
options: {
twitter: {
count: 50
}
}
}
- 4. 轮询直至完成,