Zhihu Search API Automated Extraction Skill
📖 Brief
This skill uses BrowserAct's Zhihu Search API template to provide a one-stop article extraction service. It extracts structured article details and full content from Zhihu article search results based on keywords and publication date filters.
✨ Features
- 1. No hallucinations, ensuring stable and precise data extraction: Pre-set workflows avoid AI generative hallucinations.
- No CAPTCHA issues: No need to handle reCAPTCHA or other verification challenges.
- No IP access restrictions and geo-fencing: No need to deal with regional IP limits.
- Faster execution speed: Compared to pure AI-driven browser automation solutions, task execution is much faster.
- Extremely high cost-effectiveness: Significantly reduces data acquisition costs compared to AI solutions that consume a large number of tokens.
🔑 API Key Guide
Before running, you need to check the
BROWSERACT_API_KEY environment variable. If it is not set, do not take other actions; wait for the user to provide it.
The Agent must inform the user at this time:
"Since you have not configured the BrowserAct API Key yet, please go to the BrowserAct Console first to get your Key."
🛠️ Input Parameters
The Agent should flexibly configure the following parameters according to user needs when calling the script:
- 1. keyword (Search Keywords)
-
Type:
string
-
Description: Search keywords used to find Zhihu articles. Can be company name, industry term, etc.
-
Example:
AI agent, INLINECODE3
- 2. Publish_date (Publication Date Range)
-
Type:
string
-
Description: Filter articles by publication date.
-
Options:
-
7d: Past 7 days
-
30d: Past 30 days
-
90d: Past 90 days
-
1y: Past year
-
all: Any time
-
Default: INLINECODE10
- 3. Date_limit (Extraction Limit)
-
Type:
number
-
Description: Maximum number of articles to extract.
-
Default: INLINECODE12
🚀 Recommended Usage
The Agent should execute the following independent script to achieve "one command gets results":
CODEBLOCK0
⏳ Execution Status Monitoring
Because this task involves automated browser operations, it may take a long time (several minutes). The script will
continuously output status logs with timestamps (e.g.,
[14:30:05] Task Status: running) while running.
Agent Must Know:
- - While waiting for the script to return a result, keep monitoring the terminal output.
- As long as the terminal is still outputting new status logs, it means the task is running normally, do not mistake it for a deadlock or unresponsiveness.
- If the status remains unchanged for a long time or the script stops outputting without returning a result, then consider triggering the retry mechanism.
📊 Data Output
Upon successful execution, the script will directly parse and print the result from the API response. The result includes:
- -
title: Full article title - INLINECODE15 : Full body content of the article
- INLINECODE16 : Main image URL or article cover image URL
- INLINECODE17 : Article author or publishing account name
- INLINECODE18 : Article publication date
- INLINECODE19 : Original article URL
⚠️ Error Handling & Retry
During the execution of the script, if an error is encountered (such as network fluctuations or task failure), the Agent should follow this logic:
- 1. Check the output content:
- If the output
contains "Invalid authorization", it means the API Key is invalid or expired. Do
not retry at this time, and guide the user to check and provide the correct API Key.
- If the output
does not contain "Invalid authorization" but the task execution fails (for example, the output starts with
Error: or the return result is empty), the Agent should
automatically try to execute the script once more.
- 2. Retry limits:
- Automatic retry is limited to
one time. If the second attempt still fails, stop retrying and report the specific error information to the user.
🌟 Typical Use Cases
- 1. Industry Trend Tracking: Find the latest industry dynamics on specific topics like "low-altitude economy" or "generative AI" on Zhihu.
- Public Relations Monitoring: Monitor the media exposure of a specific brand or company on Zhihu over the past 30 days.
- Competitor Intelligence Gathering: Collect recent product information or market activities published by competitors on Zhihu.
- Market Hotspot Research: Get popular Zhihu reports on specific keywords across different time dimensions.
- Character Dynamics Tracking: Retrieve the latest Zhihu articles and interviews of industry leaders or public figures.
- Daily Briefing Summary: Automatically extract and summarize daily industry news briefings from Zhihu.
- Global Event Monitoring: Real-time access to major breaking news and discussions on Zhihu.
- Structured Data Extraction: Extract structured information such as article titles, authors, and links from Zhihu for market research analysis.
- Media Exposure Analysis: Evaluate the spread and popularity of a specific project or event on Zhihu.
- Long-term Thematic Research: Retrieve in-depth reports and discussions on a specific technical topic from the past year.
知乎搜索API自动化提取技能
📖 简介
本技能使用BrowserAct的知乎搜索API模板,提供一站式文章提取服务。根据关键词和发布日期筛选条件,从知乎文章搜索结果中提取结构化的文章详情和完整内容。
✨ 功能特点
- 1. 无幻觉,确保稳定精准的数据提取:预设工作流程避免AI生成式幻觉。
- 无验证码问题:无需处理reCAPTCHA或其他验证挑战。
- 无IP访问限制和地理围栏:无需处理区域IP限制。
- 更快的执行速度:相比纯AI驱动的浏览器自动化方案,任务执行速度更快。
- 极高的性价比:相比消耗大量token的AI方案,显著降低数据获取成本。
🔑 API密钥指南
运行前,需要检查BROWSERACT
APIKEY环境变量。如果未设置,不要执行其他操作,等待用户提供。
此时Agent必须告知用户:
由于您尚未配置BrowserAct API密钥,请先前往BrowserAct控制台获取您的密钥。
🛠️ 输入参数
Agent在调用脚本时应根据用户需求灵活配置以下参数:
- 1. keyword(搜索关键词)
-
类型:string
-
描述:用于查找知乎文章的搜索关键词,可以是公司名称、行业术语等。
-
示例:AI agent、openclaw
- 2. Publish_date(发布日期范围)
-
类型:string
-
描述:按发布日期筛选文章。
-
选项:
- 7d:过去7天
- 30d:过去30天
- 90d:过去90天
- 1y:过去一年
- all:任意时间
-
默认值:7d
- 3. Date_limit(提取数量限制)
-
类型:number
-
描述:最大提取文章数量。
-
默认值:10
🚀 推荐用法
Agent应执行以下独立脚本,实现一键获取结果:
bash
示例调用
python -u ./scripts/zhihu
searchapi.py keyword Publish_date limit
⏳ 执行状态监控
由于此任务涉及自动化浏览器操作,可能需要较长时间(几分钟)。脚本运行时将
持续输出带时间戳的状态日志(例如[14:30:05] 任务状态:运行中)。
Agent必须了解:
- - 在等待脚本返回结果时,持续监控终端输出。
- 只要终端仍在输出新的状态日志,说明任务正常运行,不要误认为死锁或无响应。
- 如果状态长时间不变或脚本停止输出且未返回结果,再考虑触发重试机制。
📊 数据输出
执行成功后,脚本将直接从API响应中解析并打印结果。结果包括:
- - title:文章完整标题
- bodycontent:文章完整正文内容
- imageurl:主图URL或文章封面图URL
- author:文章作者或发布账号名称
- publicationdate:文章发布日期
- urllink:文章原始URL
⚠️ 错误处理与重试
脚本执行过程中,如果遇到错误(如网络波动或任务失败),Agent应遵循以下逻辑:
- 1. 检查输出内容:
- 如果输出
包含Invalid authorization,说明API密钥无效或已过期。此时
不要重试,引导用户检查并提供正确的API密钥。
- 如果输出
不包含Invalid authorization但任务执行失败(例如输出以Error:开头或返回结果为空),Agent应
自动尝试再次执行脚本。
- 2. 重试限制:
- 自动重试限制为
一次。如果第二次尝试仍然失败,停止重试并向用户报告具体的错误信息。
🌟 典型使用场景
- 1. 行业趋势追踪:在知乎上查找低空经济、生成式AI等特定话题的最新行业动态。
- 公关舆情监控:监控特定品牌或公司过去30天在知乎上的媒体曝光情况。
- 竞品情报收集:收集竞争对手在知乎上发布的最新产品或市场活动信息。
- 市场热点调研:获取不同时间维度下特定关键词的知乎热门报道。
- 人物动态追踪:检索行业领袖或公众人物的最新知乎文章和访谈。
- 每日简报汇总:自动提取并汇总知乎上的每日行业新闻简报。
- 全球事件监控:实时获取知乎上的重大突发新闻和讨论。
- 结构化数据提取:从知乎提取文章标题、作者、链接等结构化信息,用于市场调研分析。
- 媒体曝光分析:评估特定项目或事件在知乎上的传播度和热度。
- 长期专题研究:检索过去一年内特定技术话题的知乎深度报道和讨论。