Substack Scraper
Scrape Substack newsletters using an Apify Actor via the REST API.
Actor ID
INLINECODE0
Prerequisites
- -
APIFY_TOKEN environment variable must be set - INLINECODE2 and
jq must be available
Workflow
Step 1: Confirm parameters with user
Ask what they want to scrape. Supported input fields:
- -
urls (array of strings) - Substack publication URLs to scrape - INLINECODE5 (integer) - max articles per publication
- INLINECODE6 (boolean) - include full article text
Step 2: Run the Actor
CODEBLOCK0
Step 3: Poll and fetch (if async)
CODEBLOCK1
Step 4: Present results
Summarize articles: titles, authors, dates, engagement. Offer JSON/CSV export.
Error Handling
- - If APIFY_TOKEN not set: INLINECODE7
- If run FAILS: check log endpoint
Substack 抓取工具
使用 Apify Actor 通过 REST API 抓取 Substack 新闻通讯。
Actor ID
BULaGFURBV7WG3K81
前置条件
- - 必须设置 APIFY_TOKEN 环境变量
- 必须安装 curl 和 jq
工作流程
步骤 1:与用户确认参数
询问用户想要抓取的内容。支持的输入字段:
- - urls(字符串数组)- 要抓取的 Substack 发布页面 URL
- maxArticles(整数)- 每个发布页面的最大文章数
- includeContent(布尔值)- 是否包含完整文章内容
步骤 2:运行 Actor
bash
RESULT=$(curl -s -X POST https://api.apify.com/v2/acts/BULaGFURBV7WG3K81/run-sync-get-dataset-items?token=$APIFY_TOKEN \
-H Content-Type: application/json \
-d {urls: [https://example.substack.com], maxArticles: 20})
echo $RESULT | jq .
步骤 3:轮询并获取(异步模式)
bash
RUN
ID=$(curl -s -X POST https://api.apify.com/v2/acts/BULaGFURBV7WG3K81/runs?token=$APIFYTOKEN \
-H Content-Type: application/json \
-d {urls: [https://example.substack.com], maxArticles: 100} | jq -r .data.id)
curl -s https://api.apify.com/v2/actor-runs/$RUN
ID?token=$APIFYTOKEN | jq -r .data.status
curl -s https://api.apify.com/v2/actor-runs/$RUN
ID/dataset/items?token=$APIFYTOKEN | jq .
步骤 4:呈现结果
汇总文章信息:标题、作者、日期、互动数据。提供 JSON/CSV 格式导出。
错误处理
- - 如果未设置 APIFYTOKEN:export APIFYTOKEN=your_token
- 如果运行失败:检查日志端点