A Share Site Crawl
Use this skill to collect public A-share information from the five target sites and to convert raw site access into repeatable summary-ready records.
Read Order
Always read these first:
Read these in addition when the task involves formal collection, normalization, or recurring jobs:
- - INLINECODE2
- INLINECODE3
- INLINECODE4
Use references/entrypoints.md for fixed site entry pages, verification status, cron priorities, and default crawl mode.
Use references/fields.md for the normalized schema, source tiering, credibility, opinion-risk handling, content typing, cron retention, time normalization, ticker normalization, and dedup rules.
Use references/risks.md for P0/P1/P2 risks, recognition signals, and downgrade or mitigation decisions.
Core Rule
Prefer browser for page truth and web_fetch for cheap probing.
- - Use
web_fetch first when the site is known to have stable public text pages - Use
browser first when the site is dynamic, disclosure-driven, or clearly stronger in rendered form - If both fail, report the site as restricted or missing instead of pretending it was covered
- Do not treat anti-bot code, disclaimers, shells, or login walls as usable content
Working Workflow
1. Start from the correct page type
- - Prefer fixed entrypoints, list pages, search pages, disclosure pages, telegraph streams, and stock-detail pages
- Do not judge 巨潮资讯 from homepage-only text
- Do not rely on noisy portal homepages when a better inner page exists
2. Probe and classify access
Judge each probe into one of these buckets:
- -
usable: readable and materially sufficient - INLINECODE13 : some content is real, but clearly incomplete
- INLINECODE14 : mainly navigation, scripts, disclaimers, or boilerplate
- INLINECODE15 : anti-bot, login wall, or meaningless payload
3. Choose extraction mode
Use one of these verdicts per site or page:
- - INLINECODE16
- INLINECODE17
- INLINECODE18
- INLINECODE19
4. Keep site roles distinct
- - 巨潮资讯: official confirmation and disclosure verification
- 东方财富: public aggregation, data-center navigation, and quasi-structured market pages
- 财联社: fast market events and telegraph flow
- 韭研公社: topic logic, timeline, and community clue discovery
- 雪球: sentiment, heat, stock-detail snapshots, and community discussion
5. Normalize before summarizing
When the task is more than a one-off crawl check, convert findings into normalized records using references/fields.md.
Minimum normalization discipline:
- - assign
source_tier, credibility, content_type, and INLINECODE24 - normalize time to Asia/Shanghai when possible
- normalize A-share tickers conservatively
- deduplicate repeated event coverage
- separate confirmed facts from market claims and sentiment
6. Apply downgrade rules early
Use references/risks.md when deciding whether to downgrade, defer, or replace a source.
Default downgrade behavior:
- - login-gated or anti-bot content -> INLINECODE26
- shell-only or disclaimer-heavy result -> switch entrypoint or switch tool
- 财联社 telegraph 默认先保留列表正文; only hit
detail when the list is truncated, a canonical URL is needed, or an original-source jump matters - 巨潮公告默认先保留列表元数据; only chase PDF when the title is high-value enough to justify body extraction, otherwise keep title-derived summary and mark that PDF body was not extracted
- community-only claim without confirmation -> keep as clue, not fact
- unavailable priority site -> disclose it and use approved fallback public sources
Default Site Priority
Use this order for stable public collection when the task does not specify a scenario:
- 1. 东方财富
- 财联社
- 巨潮资讯
- 韭研公社
- 雪球
This order reflects public accessibility and extraction stability, not market importance.
When to Ask for Stronger Access
Ask for stronger access only when the user explicitly wants better extraction from a restricted site, especially 雪球.
Examples:
- - attached Chrome relay tab
- logged-in browser profile
- cookies or authenticated environment
- a dedicated crawler or site-specific script
Scenario Call Contract
When a cron or caller specifies one of these scenario ids, treat it as a compact instruction bundle and do not ask for a longer prompt:
- -
pre-open: read references/entrypoints.md, references/fields.md, and references/risks.md; use the pre-open priority order; focus on overnight macro or overseas linkage, policy or industry catalysts, key announcements, expected hot sectors, and today's watchlist - INLINECODE32 : read
references/entrypoints.md, references/fields.md, and references/risks.md; use the intraday priority order; focus on morning index and turnover snapshot, leading or lagging themes, style or sentiment shifts, active stocks with catalysts, and deviation from the pre-open setup - INLINECODE36 : read
references/entrypoints.md, references/fields.md, and references/risks.md; use the intraday priority order; focus on whether the afternoon main line strengthens or rotates, late-session anomalies, money-flow return direction, hot-stock persistence, and signals that may affect post-close review or next-day expectations - INLINECODE40 : read
references/entrypoints.md, references/fields.md, and references/risks.md; use the post-close priority order; focus on index and turnover recap, main-line review, key stocks and drivers, important announcements plus exchange or regulator dynamics, and next-day clues with risks
For every scenario:
- - keep the output in Chinese and lead with conclusions before detail
- keep
已确认事实, 市场观点与情绪, and 待核实线索 clearly separated - keep
本轮缺失站点 and 来源层级说明 in the final output - bind every round to the entrypoint, field-normalization, and risk-downgrade rules instead of freehand summarizing
- do not output buy or sell recommendations
Standard Output
When producing a formal round output, always structure it with at least these sections:
- - INLINECODE49
- INLINECODE50
- INLINECODE51
- INLINECODE52
- INLINECODE53
Use the sections as follows:
- -
已确认事实: only T1 or well-supported T2 items, or items clearly marked as partially confirmed - INLINECODE55 : T3 discussion, heat, consensus drift, and sentiment signals
- INLINECODE56 : rumors, single-source community claims, partial clues, or conflicting statements
- INLINECODE57 : blocked, unstable, login-gated, or otherwise uncovered priority sites and what fallback was used
- INLINECODE58 : explain T1/T2/T3 usage and remind the reader that community sources are not equal to formal disclosure
Per-Site Quick Output for Crawlability Tasks
When the task is specifically about site feasibility rather than a market summary, return:
- - Site
- Status
- Recommended mode
- Best entry page
- What works
- Main limitation
- Next step
Non-Negotiables
- - Distinguish confirmed facts from community opinion
- Prefer official disclosure and high-confidence public reporting over discussion boards
- Do not output buy/sell recommendations
- Do not imply full coverage when a priority site failed or was inaccessible
A股站点爬取
使用此技能从五个目标站点收集公开的A股信息,并将原始站点访问转换为可重复的、可供总结的记录。
阅读顺序
始终优先阅读以下内容:
- - references/sites.md
- references/workflow.md
当任务涉及正式采集、标准化或定期作业时,还需阅读以下内容:
- - references/entrypoints.md
- references/fields.md
- references/risks.md
使用 references/entrypoints.md 获取固定站点入口页面、验证状态、定时任务优先级和默认爬取模式。
使用 references/fields.md 获取标准化模式、来源层级、可信度、观点风险处理、内容类型、定时任务保留策略、时间标准化、股票代码标准化和去重规则。
使用 references/risks.md 获取P0/P1/P2风险、识别信号以及降级或缓解决策。
核心规则
优先使用 browser 获取页面真实内容,使用 web_fetch 进行低成本探测。
- - 当已知站点具有稳定的公开文本页面时,优先使用 web_fetch
- 当站点是动态的、以信息披露驱动的,或渲染后的形式明显更强时,优先使用 browser
- 如果两者都失败,则报告该站点受限或缺失,而不是假装已覆盖
- 不要将反爬虫代码、免责声明、外壳页面或登录墙视为可用内容
工作流程
1. 从正确的页面类型开始
- - 优先选择固定入口点、列表页面、搜索页面、信息披露页面、快讯流和个股详情页面
- 不要仅从巨潮资讯的首页文本进行判断
- 当存在更好的内页时,不要依赖嘈杂的门户首页
2. 探测并分类访问结果
将每次探测结果归入以下类别之一:
- - 可用:可读且内容实质充分
- 部分可用:部分内容真实,但明显不完整
- 仅外壳:主要是导航、脚本、免责声明或模板内容
- 被拦截:反爬虫、登录墙或无意义的内容
3. 选择提取模式
为每个站点或页面使用以下判定之一:
4. 保持站点角色清晰
- - 巨潮资讯:官方确认和信息披露验证
- 东方财富:公开聚合、数据中心导航和准结构化市场页面
- 财联社:快速市场事件和快讯流
- 韭研公社:主题逻辑、时间线和社区线索发现
- 雪球:情绪、热度、个股详情快照和社区讨论
5. 在总结前进行标准化
当任务不仅仅是单次爬取检查时,使用 references/fields.md 将发现结果转换为标准化记录。
最低标准化规范:
- - 分配 来源层级、可信度、内容类型 和 观点风险
- 尽可能将时间标准化为亚洲/上海时区
- 保守地标准化A股股票代码
- 去重重复的事件报道
- 将已确认的事实与市场声称和情绪区分开
6. 尽早应用降级规则
在决定是否降级、推迟或替换来源时,使用 references/risks.md。
默认降级行为:
- - 需要登录或反爬虫内容 -> 受限
- 仅外壳或大量免责声明的结果 -> 切换入口点或切换工具
- 财联社快讯默认先保留列表正文;仅当列表被截断、需要规范URL或需要跳转原始来源时才进入 详情
- 巨潮公告默认先保留列表元数据;仅当标题价值足够高值得提取正文时才追踪PDF,否则保留标题衍生摘要并标记PDF正文未提取
- 仅社区来源的声称未经确认 -> 作为线索保留,不作为事实
- 不可用的优先站点 -> 披露该情况并使用经批准的备用公开来源
默认站点优先级
当任务未指定场景时,按此顺序进行稳定的公开采集:
- 1. 东方财富
- 财联社
- 巨潮资讯
- 韭研公社
- 雪球
此顺序反映公开可访问性和提取稳定性,而非市场重要性。
何时请求更强的访问权限
仅当用户明确希望从受限站点(尤其是雪球)获得更好的提取效果时,才请求更强的访问权限。
示例:
- - 附加的Chrome中继标签页
- 已登录的浏览器配置文件
- Cookie或经过身份验证的环境
- 专用爬虫或站点特定脚本
场景调用契约
当定时任务或调用方指定以下场景ID之一时,将其视为紧凑的指令包,无需请求更长的提示:
- - 盘前:阅读 references/entrypoints.md、references/fields.md 和 references/risks.md;使用盘前优先级顺序;关注隔夜宏观或海外联动、政策或行业催化剂、重要公告、预期热门板块和今日关注清单
- 午间:阅读 references/entrypoints.md、references/fields.md 和 references/risks.md;使用盘中优先级顺序;关注上午指数和成交额快照、领涨或领跌主题、风格或情绪转变、有催化剂的活跃股票以及与盘前设定的偏差
- 尾盘:阅读 references/entrypoints.md、references/fields.md 和 references/risks.md;使用盘中优先级顺序;关注下午主线是否加强或轮动、尾盘异常、资金回流方向、热门股持续性以及可能影响收盘后复盘或次日预期的信号
- 收盘后:阅读 references/entrypoints.md、references/fields.md 和 references/risks.md;使用收盘后优先级顺序;关注指数和成交额回顾、主线复盘、关键股票和驱动因素、重要公告以及交易所或监管动态、次日线索和风险
对于每个场景:
- - 输出使用中文,先给出结论再展开细节
- 保持 已确认事实、市场观点与情绪 和 待核实线索 清晰分离
- 在最终输出中包含 本轮缺失站点 和 来源层级说明
- 每一轮都绑定入口点、字段标准化和风险降级规则,而非自由总结
- 不输出买入或卖出建议
标准输出
当生成正式的一轮输出时,始终至少包含以下部分的结构:
- - 已确认事实
- 市场观点与情绪
- 待核实线索
- 本轮缺失站点
- 来源层级说明
各部分使用方式如下:
- - 已确认事实:仅包含T1或充分支持的T2项目,或明确标记为部分确认的项目
- 市场观点与情绪:T3讨论、热度、共识漂移和情绪信号
- 待核实线索:谣言、单一来源社区声称、部分线索或矛盾陈述
- 本轮缺失站点:被拦截、不稳定、需要登录或以其他方式未覆盖的优先站点以及使用的备用方案
- 来源层级说明:解释T1/T2/T3的使用,并提醒读者社区来源不等同于正式披露
针对可爬取性任务的每个站点快速输出
当任务专门针对站点可行性而非市场总结时,返回:
- - 站点
- 状态
- 推荐模式
- 最佳入口页面
- 有效内容
- 主要限制
- 下一步
不可妥协事项
- - 区分已确认事实与社区观点
- 优先选择官方披露和高置信度的公开报道,而非讨论区
- 不输出买入/卖出建议
- 当优先站点失败或无法访问时,不暗示已全面覆盖