Authoring agent-data-cli source
Overview
Use this skill to design and implement an agent-data-cli source with stable behavior and clear project fit.
This skill is deliberately stricter than normal feature work because a weak source design causes protocol drift, command confusion, and unreliable sync behavior.
It is the source-authoring path for RSS feeds, HTTP APIs, HTML scraping, browser-driven sites, finance data, news content, and other remote content systems that must fit the source/channel/content model.
Current core contract to keep in mind:
- -
source/channel is still the only core resource model - INLINECODE3 sources return
ContentSyncBatch, not flat per-row persistence instructions - shared persistence is now
content_nodes, content_channel_links, and INLINECODE7 - structural relations in core use abstract
parent; source-specific meaning belongs in INLINECODE9
Hard Gate
Do not start implementation immediately.
The required sequence is:
- 1. research
- spec
- plan
- approval
- implement
- verify
If the user explicitly wants to skip a stage, say what risk that creates before proceeding.
When to Use
Use this skill when the user wants to:
- - add a new source
- redesign an existing source
- add source capabilities such as
channel search, content search, content update, or INLINECODE13 - add support for RSS, APIs, scraping, browser automation, authentication, cookies, or remote side effects
Do not use this skill for:
- - ordinary content operations against an existing source
- unrelated CLI or store changes with no source work
Install From skills.sh
Install this skill directly from skills.sh:
CODEBLOCK0
Install
If agent-data-cli is not present locally, install it first:
CODEBLOCK1
Then load the bundled skills from this repository's skills/ directory and work from the repo root.
Important boundary:
- - source code belongs in the source workspace repo, typically INLINECODE17
- keep
agent-data-cli focused on core/cli/store/protocol work - do not install source runtime dependencies into the core project with INLINECODE19
- use
uv pip install or init.sh inside the source workspace instead
Workflow
1. Research
Identify the source type before making architecture decisions.
Classify it as one or more of:
- - RSS
- API
- HTML scraping
- browser-driven
- auth or session driven
- interact capable
Research must confirm:
- - whether the source has a real
channel concept - whether remote discovery and remote sync are distinct
- how to identify unique content
- whether the source has hierarchical or container-like content that should become INLINECODE23
- what time field is available
- how pagination or incremental fetch works
- what config is required
- whether interact is actually possible
Use available web research, local fetch tools, and the repo's fetchers/ where appropriate.
2. Spec
Write a source-specific spec before implementation.
It must define:
- - source to resource mapping
- supported capabilities
- config fields and mode if needed
- content normalization and dedup strategy
- INLINECODE25 strategy
- whether update returns only direct content, or also context nodes and INLINECODE26
- whether the source needs
relation_semantic values such as reply, contains, or INLINECODE30 - storage requirements
- error boundaries
- CLI-visible semantics
- testing scope
For native search/query views:
- - treat column names as a soft compatibility surface because multi-source and multi-channel aggregation merges by column header
- prefer explicit names such as
published_at, publisher, author, price, INLINECODE35 - avoid vague names such as
time, source, value unless that meaning is genuinely exact - column order is mainly for readability; header naming is what determines merge behavior
3. Plan
Turn the approved spec into an implementation plan.
The plan must break work into:
- - failing tests to add first
- source code units to implement
- INLINECODE39 construction path
- CLI verification steps
- persistence and audit verification
4. Approval
Wait for user approval after the spec and plan.
Do not jump from research straight to code.
5. Implement
Implement with TDD.
- - write failing tests first
- verify the failure is correct
- write minimal code
- rerun focused tests
6. Verify
Before claiming completion, verify:
- - unit tests
- CLI simulation tests
- help output
- capability and config behavior
- persistence side effects
- INLINECODE40 /
content_channel_links / content_relations side effects when update is involved - interact audit behavior when applicable
Read Next
- -
references/source-contract.md for repository rules - INLINECODE44 for feed-like sources
- INLINECODE45 for JSON or HTTP API sources
- INLINECODE46 for browser-driven sources
- INLINECODE47 for remote side effects
- INLINECODE48 for test matrix
- INLINECODE49 before final verification
创作 agent-data-cli 数据源
概述
使用此技能来设计和实现一个行为稳定、项目适配清晰的 agent-data-cli 数据源。
此技能比常规功能开发更为严格,因为薄弱的数据源设计会导致协议偏移、命令混乱以及同步行为不可靠。
这是为 RSS 订阅源、HTTP API、HTML 抓取、浏览器驱动站点、金融数据、新闻内容以及其他必须符合 source/channel/content 模型的远程内容系统提供的数据源创作路径。
需要牢记的当前核心契约:
- - source/channel 仍然是唯一的核心资源模型
- content update 数据源返回 ContentSyncBatch,而非扁平化的逐行持久化指令
- 共享持久化现在使用 contentnodes、contentchannellinks 和 contentrelations
- 核心中的结构关系使用抽象的 parent;数据源特定的含义归属于 relation_semantic
硬性门槛
不要立即开始实施。
必需的顺序是:
- 1. 研究
- 规范
- 计划
- 审批
- 实施
- 验证
如果用户明确想要跳过某个阶段,请在继续之前说明这样做会带来什么风险。
何时使用
当用户想要以下操作时使用此技能:
- - 添加新的数据源
- 重新设计现有数据源
- 添加数据源能力,如 channel search、content search、content update 或 content interact
- 添加对 RSS、API、抓取、浏览器自动化、认证、Cookie 或远程副作用功能的支持
在以下情况下不使用此技能:
- - 对现有数据源进行常规内容操作
- 与数据源工作无关的 CLI 或存储更改
从 skills.sh 安装
直接从 skills.sh 安装此技能:
bash
npx skills add https://github.com/severinzhong/agent-data-cli --skill authoring-data-cli-source
安装
如果本地没有 agent-data-cli,请先安装:
bash
git clone https://github.com/severinzhong/agent-data-cli
cd agent-data-cli
uv sync
然后从该仓库的 skills/ 目录加载捆绑的技能,并从仓库根目录开始工作。
重要边界:
- - 源代码属于数据源工作区仓库,通常是 agent-data-hub
- 保持 agent-data-cli 专注于核心/CLI/存储/协议工作
- 不要使用 uv add 将数据源运行时依赖项安装到核心项目中
- 改为在数据源工作区内使用 uv pip install 或 init.sh
工作流程
1. 研究
在做出架构决策之前,确定数据源类型。
将其归类为以下一种或多种:
- - RSS
- API
- HTML 抓取
- 浏览器驱动
- 认证或会话驱动
- 可交互
研究必须确认:
- - 数据源是否具有真实的 channel 概念
- 远程发现和远程同步是否不同
- 如何识别唯一内容
- 数据源是否具有应成为 content_relations 的层次结构或容器类内容
- 有哪些时间字段可用
- 分页或增量获取如何工作
- 需要哪些配置
- 交互是否实际可行
在适当的情况下,使用可用的网络研究、本地获取工具以及仓库的 fetchers/ 目录。
2. 规范
在实施之前编写数据源特定的规范。
它必须定义:
- - 数据源到资源的映射
- 支持的能力
- 配置字段和模式(如果需要)
- 内容规范化和去重策略
- contentkey 策略
- 更新是仅返回直接内容,还是也返回上下文节点和 contentrelations
- 数据源是否需要 relationsemantic 值,如 reply、contains 或 listitem
- 存储要求
- 错误边界
- CLI 可见的语义
- 测试范围
对于原生搜索/查询视图:
- - 将列名视为软兼容性表面,因为多数据源和多通道聚合按列标题合并
- 优先使用明确的名称,如 published_at、publisher、author、price、volume
- 避免使用模糊的名称,如 time、source、value,除非其含义确实精确
- 列顺序主要为了可读性;标题命名决定合并行为
3. 计划
将已批准的规范转化为实施计划。
计划必须将工作分解为:
- - 首先要添加的失败测试
- 要实施的源代码单元
- ContentSyncBatch 构建路径
- CLI 验证步骤
- 持久化和审计验证
4. 审批
在规范和计划之后等待用户审批。
不要从研究直接跳到代码。
5. 实施
使用测试驱动开发进行实施。
- - 首先编写失败测试
- 验证失败是正确的
- 编写最少的代码
- 重新运行聚焦的测试
6. 验证
在声称完成之前,验证:
- - 单元测试
- CLI 模拟测试
- 帮助输出
- 能力和配置行为
- 持久化副作用
- 涉及更新时的 contentnodes / contentchannellinks / contentrelations 副作用
- 适用时的交互审计行为
继续阅读
- - references/source-contract.md 了解仓库规则
- references/source-type-rss.md 了解类似订阅源的数据源
- references/source-type-api.md 了解 JSON 或 HTTP API 数据源
- references/source-type-browser.md 了解浏览器驱动的数据源
- references/source-type-interact.md 了解远程副作用
- references/source-testing.md 了解测试矩阵
- references/source-review-checklist.md 在最终验证前使用