Apify Skill
Run any Apify Actor through a standardized workflow: search → validate → execute → collect results.
Prerequisites
- -
APIFY_TOKEN env var, or a config.json with tokens (copy config.json.example) - Python 3 with
requests installed
Workflow
Step 1: Parse User Intent
Extract from the user's request:
- - Platform/target (Instagram, TikTok, Reddit, etc.)
- What to scrape (posts, profiles, hashtags, comments, etc.)
- Targets (URLs, usernames, keywords)
- Quantity/filters (how many, time range, min likes, etc.)
Step 2: Select Token
If user specifies a token name or the task maps to a specific account, use that. Otherwise use default.
Token can be provided via:
- 1.
--token flag (highest priority) - INLINECODE6 tokens map (by
--token-name) - INLINECODE8 env var (fallback)
Step 3: Search & Select Actor
Run the search script:
CODEBLOCK0
Output: ranked candidates with score, success rate, rating, pricing model.
Quality filters (built into script):
- -
notice = NONE (not deprecated) - 30-day success rate ≥ 95%
- 30-day runs ≥ 1,000
- User rating ≥ 4.0
Pick the top-ranked candidate. If user has a preference or prior experience with a specific Actor, skip search.
Step 4: Get Actor Schema & Build run_input
Fetch the Actor's documentation:
CODEBLOCK1
Read the input schema section. Construct run_input JSON based on:
- - The Actor's required/optional fields
- The user's targets and filters
- Sensible defaults from the documentation
Do NOT ask the user to write JSON. Build it from their natural language request.
Step 5: Probe Test (Top 1 → Top 2 → Top 3 fallback)
Test with minimal input before committing to full run:
CODEBLOCK2
The probe automatically uses the first 2 items from the list field.
Checks:
- - Run starts successfully (no permission/billing errors)
- Run completes (no timeout/crash)
- Returns non-empty data
If probe fails → try next candidate Actor. If all 3 fail → report to user with Actor URLs for manual activation.
Step 6: Full Execution
CODEBLOCK3
Key flags:
| Flag | Purpose | Default |
|---|
| INLINECODE11 | Field in run_input containing the list to batch | None (no batching) |
| INLINECODE12 |
Items per batch | 50 |
|
--timeout | Per-batch timeout (seconds) | 600 |
|
--probe | Run probe before full execution | Off |
|
--output | Save results to JSON file | Stdout |
|
--config | Path to config.json for token lookup | None |
|
--token-name | Which token to use from config | "default" |
Batching rules:
- - ≤ batch-size items → single run
- \> batch-size items → auto-split, 3s pause between batches
- Each batch has independent timeout (default 10 min)
Step 7: Return Results
- - Report total items collected
- Save raw JSON to specified output path
- Summarize key stats (items count, batches, any failures)
- Let the caller handle filtering/reporting/delivery
Common Actor Patterns
| Platform | Typical Actor | list_key | Example input |
|---|
| Instagram | INLINECODE18 | INLINECODE19 | INLINECODE20 |
| TikTok |
clockworks/tiktok-scraper |
hashtags |
{"hashtags": ["cooking"], "resultsPerPage": 50} |
| Reddit |
trudax/reddit-scraper-lite |
startUrls |
{"startUrls": [{"url": "https://reddit.com/r/cooking/top/?t=month"}], "maxItems": 30} |
| Twitter |
apidojo/tweet-scraper | — | Check .md for current schema |
These are starting points. Always verify with the Actor's .md page for current schema.
Apify 技能
通过标准化工作流程运行任意 Apify Actor:搜索 → 验证 → 执行 → 收集结果。
前置条件
- - APIFY_TOKEN 环境变量,或包含令牌的 config.json 文件(复制 config.json.example)
- 已安装 requests 库的 Python 3
工作流程
步骤 1:解析用户意图
从用户请求中提取:
- - 平台/目标(Instagram、TikTok、Reddit 等)
- 抓取内容(帖子、个人资料、话题标签、评论等)
- 目标(URL、用户名、关键词)
- 数量/筛选条件(数量、时间范围、最低点赞数等)
步骤 2:选择令牌
如果用户指定了令牌名称或任务对应特定账户,则使用该令牌。否则使用 default。
令牌可通过以下方式提供:
- 1. --token 参数(最高优先级)
- config.json 令牌映射(通过 --token-name)
- APIFY_TOKEN 环境变量(后备方案)
步骤 3:搜索并选择 Actor
运行搜索脚本:
bash
python3 scripts/search_actor.py instagram scraper --top 3
输出:按评分、成功率、评级、定价模型排序的候选列表。
质量筛选条件(内置于脚本):
- - notice = NONE(未弃用)
- 30天成功率 ≥ 95%
- 30天运行次数 ≥ 1,000
- 用户评分 ≥ 4.0
选择排名最高的候选。如果用户有偏好或之前使用过特定 Actor,则跳过搜索。
步骤 4:获取 Actor 模式并构建 run_input
获取 Actor 的文档:
bash
webfetch https://apify.com/{actorid}.md
阅读输入模式部分。基于以下内容构建 run_input JSON:
- - Actor 的必填/可选字段
- 用户的目标和筛选条件
- 文档中的合理默认值
不要要求用户编写 JSON。 根据他们的自然语言请求构建。
步骤 5:探测测试(Top 1 → Top 2 → Top 3 后备方案)
在提交完整运行之前,使用最小输入进行测试:
bash
python3 scripts/apifyrunner.py {actorid} \
--input {...} \
--token {token} \
--probe-only \
--list-key {key}
探测自动使用列表字段的前 2 个项目。
检查项:
- - 运行成功启动(无权限/计费错误)
- 运行完成(无超时/崩溃)
- 返回非空数据
如果探测失败 → 尝试下一个候选 Actor。如果全部 3 个都失败 → 向用户报告并提供 Actor URL 以便手动激活。
步骤 6:完整执行
bash
python3 scripts/apifyrunner.py {actorid} \
--input {...} \
--token {token} \
--output /path/to/results.json \
--list-key {key} \
--batch-size 50 \
--probe
关键参数:
| 参数 | 用途 | 默认值 |
|---|
| --list-key | run_input 中包含要分批处理的列表字段 | 无(不分批) |
| --batch-size |
每批项目数 | 50 |
| --timeout | 每批超时时间(秒) | 600 |
| --probe | 完整执行前运行探测 | 关闭 |
| --output | 将结果保存到 JSON 文件 | 标准输出 |
| --config | 用于令牌查找的 config.json 路径 | 无 |
| --token-name | 从配置中使用哪个令牌 | default |
分批规则:
- - ≤ batch-size 项目 → 单次运行
- \> batch-size 项目 → 自动拆分,批次间暂停 3 秒
- 每批有独立的超时时间(默认 10 分钟)
步骤 7:返回结果
- - 报告收集的项目总数
- 将原始 JSON 保存到指定的输出路径
- 汇总关键统计信息(项目数、批次数、任何失败)
- 由调用者处理筛选/报告/交付
常见 Actor 模式
| 平台 | 典型 Actor | list_key | 示例输入 |
|---|
| Instagram | apify/instagram-scraper | directUrls | {directUrls: [https://instagram.com/user/], resultsType: posts, resultsLimit: 3} |
| TikTok |
clockworks/tiktok-scraper | hashtags | {hashtags: [cooking], resultsPerPage: 50} |
| Reddit | trudax/reddit-scraper-lite | startUrls | {startUrls: [{url: https://reddit.com/r/cooking/top/?t=month}], maxItems: 30} |
| Twitter | apidojo/tweet-scraper | — | 查看 .md 文件获取当前模式 |
这些是起点。始终通过 Actor 的 .md 页面验证当前模式。