Emerging Topic Scout
A real-time monitoring system for identifying "incubation period" research hotspots in biological and medical sciences before they are defined by mainstream journals.
Overview
This skill continuously monitors:
- - bioRxiv: Biology preprints via RSS/API ⚠️ Currently blocked by Cloudflare
- medRxiv: Medicine preprints via RSS/API ⚠️ Currently blocked by Cloudflare
- arXiv: Quantitative Biology preprints via RSS ✅ Recommended alternative
- Academic discussions: Social media and forum mentions
It uses trend analysis algorithms to detect sudden spikes in topic frequency, cross-platform mentions, and emerging keyword clusters.
⚠️ Network Access Notice
bioRxiv and medRxiv are currently protected by Cloudflare JavaScript Challenge, which prevents programmatic RSS access. As a workaround, this skill now supports arXiv q-bio (Quantitative Biology) as an alternative data source.
Recommended usage:
CODEBLOCK0
Installation
CODEBLOCK1
Usage
Basic Scan (Recommended: Use arXiv)
CODEBLOCK2
Legacy bioRxiv/medRxiv (May not work due to Cloudflare)
CODEBLOCK3
Advanced Configuration (arXiv Recommended)
CODEBLOCK4
Legacy Configuration (bioRxiv/medRxiv - May not work)
CODEBLOCK5 json
{
"scan_date": "2026-02-06T05:57:00Z",
"sources": ["biorxiv", "medrxiv"],
"hot_topics": [
{
"topic": "gene editing therapy",
"keywords": ["CRISPR", "base editing", "prime editing"],
"trending_score": 0.89,
"velocity": "rapid",
"preprint_count": 34,
"crossplatformmentions": 127,
"related_papers": [
{
"title": "New CRISPR variant shows promise",
"authors": ["Smith J.", "Lee K."],
"doi": "10.1101/2026.01.15.xxxxx",
"source": "biorxiv",
"published": "2026-01-15",
"abstract_summary": "..."
}
],
"emerging_since": "2026-01-20"
}
],
"summary": {
"totalpapersanalyzed": 1247,
"newtopicsdetected": 8,
"highpriorityalerts": 2
}
}
### Markdown Output
markdown
Emerging Topics Report - 2026-02-06
🔥 High Priority Topics
1. Gene Editing Therapy (Score: 0.89)
- - Keywords: CRISPR, base editing, prime editing
- Growth Rate: Rapid (+145% vs last week)
- Preprints: 34 papers
- Cross-platform mentions: 127
Key Papers
- 1. "New CRISPR variant shows promise" - Smith J. et al.
- DOI: 10.1101/2026.01.15.xxxxx
- Source: bioRxiv
## Configuration File
Create `config.yaml` for persistent settings:
yaml
sources:
arxiv:
enabled: true
rss_url: "https://export.arxiv.org/rss/q-bio"
description: "arXiv Quantitative Biology - Recommended (no Cloudflare)"
biorxiv:
enabled: false # Disabled due to Cloudflare protection
rss_url: "https://www.biorxiv.org/rss/recent.rss"
api_endpoint: "https://api.biorxiv.org/details/"
note: "Currently blocked by Cloudflare JavaScript Challenge"
medrxiv:
enabled: false # Disabled due to Cloudflare protection
rss_url: "https://www.medrxiv.org/rss/recent.rss"
api_endpoint: "https://api.medrxiv.org/details/"
note: "Currently blocked by Cloudflare JavaScript Challenge"
trending:
minpapersthreshold: 5
velocitywindowdays: 3
novelty_weight: 0.4
momentum_weight: 0.6
keywords:
auto_detect: true
custom_trackers:
- "artificial intelligence"
- "machine learning"
- "single cell"
- "spatial transcriptomics"
output:
default_format: markdown
save_history: true
history_path: "./data/history.json"
notifications:
enabled: false
highscorethreshold: 0.8
## Trending Score Algorithm
The trending score (0-1) is calculated using:
Score = (Novelty × 0.4) + (Momentum × 0.4) + (CrossRef × 0.2)
Where:
- - Novelty: Inverse frequency of topic in historical data
- Momentum: Rate of increase in mentions over velocity window
- CrossRef: Mentions across multiple platforms
## API Endpoints
### bioRxiv API
- Base: `https://api.biorxiv.org/`
- Details: `/details/[server]/[DOI]/[format]`
- Publication: `/pub/[DOI]/[format]`
### medRxiv API
- Same structure as bioRxiv
## Data Storage
Historical data is stored in `data/history.json` for:
- Trend comparison
- Velocity calculation
- Duplicate detection
## Examples
### Example 1: Quick Daily Scan (arXiv - Recommended)
bash
python scripts/main.py --sources arxiv --days 1 --output markdown
### Example 2: Daily Scan with bioRxiv (May not work)
bash
python scripts/main.py --sources biorxiv --days 1 --output markdown
Note: May return 0 results due to Cloudflare protection
Example 2: Weekly Deep Analysis
CODEBLOCK11
Example 3: Track Specific Research Area
CODEBLOCK12
Known Issues
bioRxiv/medRxiv Cloudflare Protection
Status: ❌ Blocked
Issue: bioRxiv and medRxiv RSS feeds are protected by Cloudflare JavaScript Challenge, which prevents programmatic access. The site returns an HTML page requiring JavaScript execution and cookie validation.
Attempted Solutions:
- 1. ✅ Added browser User-Agent headers → Failed (Cloudflare detects bot)
- ✅ Added complete browser headers (Accept, Accept-Language, etc.) → Failed
- ❌ Browser automation (Selenium/Playwright) → Not implemented (complex, heavy dependency)
Workaround: ✅ Use arXiv instead
- - arXiv q-bio (Quantitative Biology) RSS is accessible without protection
- Contains computational biology, bioinformatics, and quantitative biology papers
- Successfully tested: 35+ papers fetched in 30-day window
Usage:
CODEBLOCK13
Troubleshooting
Rate Limiting
If you encounter rate limits, increase the
--delay parameter (default: 1s between requests).
Missing Papers (0 results from bioRxiv/medRxiv)
This is expected due to Cloudflare protection.
Use --sources arxiv instead.
RSS Feed Access Denied
Some institutional firewalls may block preprint servers. Ensure you can access:
- - ✅
https://export.arxiv.org/rss/q-bio (should work) - ❌
https://www.biorxiv.org/rss/recent.rss (Cloudflare blocked)
Low Trending Scores
For niche topics, lower
--min-score threshold or increase
--days for more data.
References
See references/README.md for:
- - API documentation links
- Research papers on trend detection
- Related tools and resources
License
MIT License - Part of OpenClaw Skills Collection
Risk Assessment
| Risk Indicator | Assessment | Level |
|---|
| Code Execution | Python scripts with tools | High |
| Network Access |
External API calls | High |
| File System Access | Read/write data | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Data handled securely | Medium |
Security Checklist
- - [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] API requests use HTTPS only
- [ ] Input validated against allowed patterns
- [ ] API timeout and retry mechanisms implemented
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no internal paths exposed)
- [ ] Dependencies audited
- [ ] No exposure of internal service architecture
Prerequisites
CODEBLOCK14
Evaluation Criteria
Success Metrics
- - [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
Test Cases
- 1. Basic Functionality: Standard input → Expected output
- Edge Case: Invalid input → Graceful error handling
- Performance: Large dataset → Acceptable processing time
Lifecycle Status
- - Current Stage: Draft
- Next Review Date: 2026-03-06
- Known Issues:
- ⚠️
bioRxiv/medRxiv blocked by Cloudflare (use arXiv as workaround)
- Network access limitations for some RSS feeds
- Investigate bioRxiv/medRxiv API alternatives
- Consider browser automation for Cloudflare bypass
- Add more arXiv categories (q-bio subcategories)
- Performance optimization
新兴主题侦察员
一个实时监控系统,用于识别生物和医学科学中在主流期刊定义之前的潜伏期研究热点。
概述
本技能持续监控:
- - bioRxiv:通过RSS/API获取生物学预印本 ⚠️ 目前被Cloudflare屏蔽
- medRxiv:通过RSS/API获取医学预印本 ⚠️ 目前被Cloudflare屏蔽
- arXiv:通过RSS获取定量生物学预印本 ✅ 推荐替代方案
- 学术讨论:社交媒体和论坛提及
它使用趋势分析算法来检测主题频率的突然激增、跨平台提及以及新兴关键词集群。
⚠️ 网络访问说明
bioRxiv和medRxiv目前受Cloudflare JavaScript挑战保护,阻止了程序化RSS访问。作为变通方案,本技能现在支持arXiv q-bio(定量生物学)作为替代数据源。
推荐用法:
bash
使用arXiv获取可靠数据
python scripts/main.py --sources arxiv --days 30
bioRxiv/medRxiv由于Cloudflare保护可能返回0结果
python scripts/main.py --sources biorxiv medrxiv --days 30 # 可能无法工作
安装
bash
cd /Users/z04030865/.openclaw/workspace/skills/emerging-topic-scout
pip install -r scripts/requirements.txt
使用方法
基础扫描(推荐:使用arXiv)
bash
python scripts/main.py --sources arxiv --days 7 --output json
旧版bioRxiv/medRxiv(由于Cloudflare可能无法工作)
bash
python scripts/main.py --sources biorxiv medrxiv --days 7 --output json
高级配置(推荐arXiv)
bash
python scripts/main.py \
--sources arxiv \
--keywords CRISPR,基因编辑,机器学习 \
--days 14 \
--min-score 0.7 \
--output markdown \
--notify
旧版配置(bioRxiv/medRxiv - 可能无法工作)
bash
python scripts/main.py \
--sources biorxiv medrxiv \
--keywords CRISPR,基因编辑,长新冠 \
--days 14 \
--min-score 0.7 \
--output markdown \
--notify
注意:bioRxiv/medRxiv由于Cloudflare保护可能返回0结果
参数
| 参数 | 类型 | 默认值 | 描述 |
|---|
| --sources | 列表 | arxiv | 要监控的数据源(由于biorxiv/medrxiv存在Cloudflare问题,推荐使用arxiv) |
| --keywords |
字符串 | (自动检测) | 要跟踪的逗号分隔关键词 |
| --days | 整数 | 7 | 回溯天数 |
| --min-score | 浮点数 | 0.6 | 最低趋势评分(0-1) |
| --max-topics | 整数 | 20 | 返回的最大主题数 |
| --output | 字符串 | markdown | 输出格式:json、markdown、csv |
| --notify | 标志 | false | 为高优先级主题发送通知 |
| --config | 路径 | config.yaml | 配置文件路径 |
输出格式
JSON输出
json
{
scan_date: 2026-02-06T05:57:00Z,
sources: [biorxiv, medrxiv],
hot_topics: [
{
topic: 基因编辑疗法,
keywords: [CRISPR, 碱基编辑, 先导编辑],
trending_score: 0.89,
velocity: 快速,
preprint_count: 34,
crossplatformmentions: 127,
related_papers: [
{
title: 新型CRISPR变体展现前景,
authors: [Smith J., Lee K.],
doi: 10.1101/2026.01.15.xxxxx,
source: biorxiv,
published: 2026-01-15,
abstract_summary: ...
}
],
emerging_since: 2026-01-20
}
],
summary: {
totalpapersanalyzed: 1247,
newtopicsdetected: 8,
highpriorityalerts: 2
}
}
Markdown输出
markdown
新兴主题报告 - 2026-02-06
🔥 高优先级主题
1. 基因编辑疗法(评分:0.89)
- - 关键词:CRISPR、碱基编辑、先导编辑
- 增长率:快速(较上周增长+145%)
- 预印本:34篇论文
- 跨平台提及:127次
关键论文
- 1. 新型CRISPR变体展现前景 - Smith J.等
- DOI:10.1101/2026.01.15.xxxxx
- 来源:bioRxiv
配置文件
创建config.yaml以持久化设置:
yaml
sources:
arxiv:
enabled: true
rss_url: https://export.arxiv.org/rss/q-bio
description: arXiv定量生物学 - 推荐(无Cloudflare)
biorxiv:
enabled: false # 因Cloudflare保护而禁用
rss_url: https://www.biorxiv.org/rss/recent.rss
api_endpoint: https://api.biorxiv.org/details/
note: 目前被Cloudflare JavaScript挑战屏蔽
medrxiv:
enabled: false # 因Cloudflare保护而禁用
rss_url: https://www.medrxiv.org/rss/recent.rss
api_endpoint: https://api.medrxiv.org/details/
note: 目前被Cloudflare JavaScript挑战屏蔽
trending:
minpapersthreshold: 5
velocitywindowdays: 3
novelty_weight: 0.4
momentum_weight: 0.6
keywords:
auto_detect: true
custom_trackers:
- 人工智能
- 机器学习
- 单细胞
- 空间转录组学
output:
default_format: markdown
save_history: true
history_path: ./data/history.json
notifications:
enabled: false
highscorethreshold: 0.8
趋势评分算法
趋势评分(0-1)使用以下公式计算:
评分 = (新颖性 × 0.4) + (动量 × 0.4) + (交叉引用 × 0.2)
其中:
- - 新颖性:主题在历史数据中的逆频率
- 动量:在速度窗口内提及次数的增长率
- 交叉引用:跨多个平台的提及
API端点
bioRxiv API
- - 基础:https://api.biorxiv.org/
- 详情:/details/[服务器]/[DOI]/[格式]
- 发布:/pub/[DOI]/[格式]
medRxiv API
数据存储
历史数据存储在data/history.json中,用于:
示例
示例1:快速每日扫描(arXiv - 推荐)
bash
python scripts/main.py --sources arxiv --days 1 --output markdown
示例2:使用bioRxiv进行每日扫描(可能无法工作)
bash
python scripts/main.py --sources biorxiv --days 1 --output markdown
注意:由于Cloudflare保护,可能返回0结果
示例2:每周深度分析
bash
python scripts/main.py \
--days 7 \
--min-score 0.7 \
--max-topics 50 \
--output json \
> weekly_report.json
示例3:跟踪特定研究领域
bash
python scripts/main.py \
--keywords 阿尔茨海默病,神经退行性变,淀粉样蛋白 \
--days 30 \
--min-score 0.5
已知问题
bioRxiv/medRxiv Cloudflare保护
状态: ❌ 已屏蔽
问题: bioRxiv和medRxiv的RSS源受Cloudflare JavaScript挑战保护,阻止了程序化访问。该网站返回需要JavaScript执行和Cookie验证的HTML页面。
已尝试的解决方案:
1.