Multi-Source Feed
AI-curated daily tech brief aggregated from customizable sources (X, HN, GitHub Trending, RSS blogs, Reddit, Product Hunt, Tavily, and more). Deduplicates, filters by your interests, and delivers a structured memo.
Setup
When the user asks to set up Multi-Source Feed, follow these steps in order. Execute each step automatically. If any step fails, print the manual command and continue.
Step 1: Clone & Install
CODEBLOCK0
If clone already exists, skip to pip install.
Step 2: API Keys
Some sources require API keys that the user must register themselves. Ask the user for:
- 1. Tavily (free): powers web search to catch trending topics not covered by RSS feeds. Sign up at https://tavily.com
- Product Hunt (free): required for the Product Hunt GraphQL API. Get a token at https://api.producthunt.com/v2/docs
Once the user provides both keys, write them to ~/multi-source-feed/.env:
CODEBLOCK1
Step 3: X/Twitter Login
Tell the user:
"To save your X/Twitter session, you need to:
- 1. Open Chrome with remote debugging enabled by running:
open -a 'Google Chrome' --args --remote-debugging-port=9222
- 2. Log in to X/Twitter in that Chrome window
- Once logged in, I'll run a script that connects to that browser and saves your session cookies."
After the user confirms they are logged in to X in Chrome, run:
CODEBLOCK2
This script connects to the already-open Chrome instance via CDP (Chrome DevTools Protocol) on port 9222, extracts the session/cookies, and saves them to x_session.json in the project root. It does not open a new browser window — it requires Chrome to already be running with --remote-debugging-port=9222.
Step 4: Customize
This step directly affects the quality of the daily brief. Strongly encourage the user to customize before proceeding.
Ask the user:
"The default profile is a generic template. I strongly recommend customizing these files to match your interests — this directly determines the quality of your daily brief. What topics do you care about? What should be filtered out?"
Based on their response:
- - Edit
config/user_profile.md — set their interests, non-interests, and Key Players to track - Adjust
config/sources.yaml if needed (enable/disable sources, add their own RSS feeds) - Adjust
config/preferences.md if they want different memo sections or format
If they insist on skipping, move on — but remind them they can customize later.
Step 5: Test Run
CODEBLOCK3
Show the user the output summary (number of sources, items fetched, any errors). If successful, show 5-10 sample titles from feed_slim.json.
Step 6: Schedule
The system runs in two phases. Phase 1 (scraping) must complete before Phase 2 (memo generation) starts.
Phase 1: Scrape (crontab) — Pure Python job that fetches all sources, deduplicates, and writes feed_slim.json. Set up a daily cron job:
CODEBLOCK4
Phase 2: Memo (OpenClaw cron) — LLM-powered job that generates the daily brief and sends it to the user. Must run ~20 min after Phase 1 to ensure scraping is complete.
Create an OpenClaw cron job that:
- 1. Checks if
feed_slim.json exists and is from today - Reads
config/user_profile.md and INLINECODE12 - Reads
feed_slim.json (the scrape output) - Generates the daily brief following preferences.md format
- Sends the brief to the user via their configured channel
- Saves the brief to
memo/YYYY-MM-DD.md (used for cross-day dedup)
Tell the user:
"Setup complete! Your daily brief will be generated every morning. You'll receive it through your configured messaging channel."
Manual Setup Fallback
If automated setup fails, provide the user with these commands to run manually:
CODEBLOCK5
Customization
All user-customizable files are in config/:
| File | Purpose |
|---|
| INLINECODE16 | Your interests, Key Players to track |
| INLINECODE17 |
Enable/disable sources, add RSS feeds |
|
config/preferences.md | Memo format, sections, filtering rules |
Adding a new RSS source
Add 4 lines to config/sources.yaml:
CODEBLOCK6
X-Push (Optional)
If the user wants real-time X/Twitter highlights every 2 hours (in addition to the daily brief):
- 1. Ask the user if they want to customize
push/preferences.md (filtering rules and output format) - Create an OpenClaw cron job (every 2 hours) with this prompt:
CODEBLOCK7
The push module shares x_session.json and .venv with the main pipeline — no extra setup needed.
多源信息流
AI策划的每日科技简报,聚合自可自定义的多个来源(X、HN、GitHub Trending、RSS博客、Reddit、Product Hunt、Tavily等)。自动去重,根据您的兴趣筛选,并生成结构化的备忘录。
设置
当用户要求设置多源信息流时,请按顺序执行以下步骤。自动执行每一步。如果任何步骤失败,打印手动命令并继续。
步骤1:克隆与安装
bash
cd ~ && git clone https://github.com/zidooong/multi-source-feed.git
cd ~/multi-source-feed
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
playwright install chromium
如果克隆已存在,则跳转到pip install。
步骤2:API密钥
某些来源需要用户自行注册的API密钥。向用户询问:
- 1. Tavily(免费):支持网络搜索,以捕捉RSS源未覆盖的热门话题。在 https://tavily.com 注册
- Product Hunt(免费):Product Hunt GraphQL API所需。在 https://api.producthunt.com/v2/docs 获取令牌
用户提供两个密钥后,将其写入~/multi-source-feed/.env:
TAVILYAPIKEY=<用户的密钥>
PRODUCTHUNTAPITOKEN=<用户的密钥>
步骤3:X/Twitter登录
告知用户:
要保存您的X/Twitter会话,您需要:
- 1. 通过运行以下命令,启用远程调试功能打开Chrome:
open -a Google Chrome --args --remote-debugging-port=9222
- 2. 在该Chrome窗口中登录X/Twitter
- 登录后,我将运行一个脚本,连接到该浏览器并保存您的会话Cookie。
用户确认已在Chrome中登录X后,运行:
bash
cd ~/multi-source-feed && source .venv/bin/activate && python loginsavesession.py
该脚本通过CDP(Chrome DevTools协议)连接到已打开的Chrome实例(端口9222),提取会话/Cookie,并将其保存到项目根目录的x_session.json中。它不会打开新的浏览器窗口——要求Chrome已通过--remote-debugging-port=9222参数运行。
步骤4:自定义
此步骤直接影响每日简报的质量。 强烈建议用户在继续之前进行自定义。
询问用户:
默认配置文件是一个通用模板。我强烈建议您自定义这些文件以匹配您的兴趣——这直接决定了您每日简报的质量。您关心哪些话题?应该过滤掉哪些内容?
根据他们的回答:
- - 编辑config/user_profile.md——设置他们的兴趣、非兴趣以及要跟踪的关键人物
- 根据需要调整config/sources.yaml(启用/禁用来源,添加他们自己的RSS源)
- 如果他们想要不同的备忘录部分或格式,调整config/preferences.md
如果他们坚持跳过,继续——但提醒他们以后可以自定义。
步骤5:测试运行
bash
cd ~/multi-source-feed && source .venv/bin/activate && python -m src.pipeline
向用户显示输出摘要(来源数量、获取的项目、任何错误)。如果成功,从feed_slim.json中显示5-10个示例标题。
步骤6:定时调度
系统分两个阶段运行。第一阶段(抓取)必须在第二阶段(备忘录生成)开始之前完成。
第一阶段:抓取(crontab)——纯Python作业,获取所有来源,去重,并写入feed_slim.json。设置每日cron作业:
bash
(crontab -l 2>/dev/null; echo 0 9 * cd ~/multi-source-feed && .venv/bin/python3 -m src.pipeline >> /tmp/msf-scrape.log 2>&1) | crontab -
第二阶段:备忘录(OpenClaw cron)——由LLM驱动的作业,生成每日简报并发送给用户。必须在第一阶段完成后约20分钟运行,以确保抓取完成。
创建一个OpenClaw cron作业,该作业:
- 1. 检查feedslim.json是否存在且是今天的文件
- 读取config/userprofile.md和config/preferences.md
- 读取feed_slim.json(抓取输出)
- 按照preferences.md格式生成每日简报
- 通过用户配置的渠道将简报发送给用户
- 将简报保存到memo/YYYY-MM-DD.md(用于跨日去重)
告知用户:
设置完成!您的每日简报将在每天早上生成。您将通过配置的消息渠道收到。
手动设置备用方案
如果自动设置失败,向用户提供以下手动运行命令:
bash
1. 克隆
git clone https://github.com/zidooong/multi-source-feed.git && cd multi-source-feed
2. 安装
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt && playwright install chromium
3. 配置
cp .env.example .env
编辑.env文件,填入您的API密钥
4. X登录
python login
savesession.py
5. 测试
python -m src.pipeline
6. 定时抓取
crontab -e
添加:0 9 * cd ~/multi-source-feed && .venv/bin/python3 -m src.pipeline >> /tmp/msf-scrape.log 2>&1
自定义
所有用户可自定义的文件都在config/目录中:
| 文件 | 用途 |
|---|
| config/user_profile.md | 您的兴趣、要跟踪的关键人物 |
| config/sources.yaml |
启用/禁用来源,添加RSS源 |
| config/preferences.md | 备忘录格式、部分、过滤规则 |
添加新的RSS源
在config/sources.yaml中添加4行:
yaml
type: rss
enabled: true
url: https://example.com/feed.xml
tags: [博客]
X推送(可选)
如果用户希望每2小时获取实时的X/Twitter亮点(除了每日简报之外):
- 1. 询问用户是否要自定义push/preferences.md(过滤规则和输出格式)
- 创建一个OpenClaw cron作业(每2小时),使用以下提示:
按顺序执行以下步骤:
- 1. 运行:bash ~/multi-source-feed/push/run.sh
(等待完成,通常需要2-3分钟)
- 2. 如果退出代码非零,通知用户X推送抓取失败,然后停止。
- 3. 读取push/new_posts.json。如果posts数组为空,静默停止。
- 4. 读取push/preferences.md以获取过滤规则和输出格式。
- 5. 读取config/user_profile.md以了解读者关心什么。
- 6. 按照push/preferences.md中的格式过滤并发送值得关注的帖子。
- 7. 结束。
推送模块与主流水线共享x_session.json和.venv——无需额外设置。