Authoring agent-data-cli source

Overview

Use this skill to design and implement an agent-data-cli source with stable behavior and clear project fit.

This skill is deliberately stricter than normal feature work because a weak source design causes protocol drift, command confusion, and unreliable sync behavior.

It is the source-authoring path for RSS feeds, HTTP APIs, HTML scraping, browser-driven sites, finance data, news content, and other remote content systems that must fit the source/channel/content model.

Current core contract to keep in mind:

- source/channel is still the only core resource model
INLINECODE3 sources return ContentSyncBatch, not flat per-row persistence instructions
shared persistence is now content_nodes, content_channel_links, and INLINECODE7
structural relations in core use abstract parent; source-specific meaning belongs in INLINECODE9

Hard Gate

Do not start implementation immediately.

The required sequence is:

1. research
spec
plan
approval
implement
verify

If the user explicitly wants to skip a stage, say what risk that creates before proceeding.

When to Use

Use this skill when the user wants to:

- add a new source
redesign an existing source
add source capabilities such as channel search, content search, content update, or INLINECODE13
add support for RSS, APIs, scraping, browser automation, authentication, cookies, or remote side effects

Do not use this skill for:

- ordinary content operations against an existing source
unrelated CLI or store changes with no source work

Install From skills.sh

Install this skill directly from skills.sh:

CODEBLOCK0

Install

If agent-data-cli is not present locally, install it first:

CODEBLOCK1

Then load the bundled skills from this repository's skills/ directory and work from the repo root.

Important boundary:

- source code belongs in the source workspace repo, typically INLINECODE17
keep agent-data-cli focused on core/cli/store/protocol work
do not install source runtime dependencies into the core project with INLINECODE19
use uv pip install or init.sh inside the source workspace instead

Workflow

1. Research

Identify the source type before making architecture decisions.

Classify it as one or more of:

- RSS
API
HTML scraping
browser-driven
auth or session driven
interact capable

Research must confirm:

- whether the source has a real channel concept
whether remote discovery and remote sync are distinct
how to identify unique content
whether the source has hierarchical or container-like content that should become INLINECODE23
what time field is available
how pagination or incremental fetch works
what config is required
whether interact is actually possible

Use available web research, local fetch tools, and the repo's fetchers/ where appropriate.

2. Spec

Write a source-specific spec before implementation.

It must define:

- source to resource mapping
supported capabilities
config fields and mode if needed
content normalization and dedup strategy
INLINECODE25 strategy
whether update returns only direct content, or also context nodes and INLINECODE26
whether the source needs relation_semantic values such as reply, contains, or INLINECODE30
storage requirements
error boundaries
CLI-visible semantics
testing scope

For native search/query views:

- treat column names as a soft compatibility surface because multi-source and multi-channel aggregation merges by column header
prefer explicit names such as published_at, publisher, author, price, INLINECODE35
avoid vague names such as time, source, value unless that meaning is genuinely exact
column order is mainly for readability; header naming is what determines merge behavior

3. Plan

Turn the approved spec into an implementation plan.

The plan must break work into:

- failing tests to add first
source code units to implement
INLINECODE39 construction path
CLI verification steps
persistence and audit verification

4. Approval

Wait for user approval after the spec and plan.

Do not jump from research straight to code.

5. Implement

Implement with TDD.

- write failing tests first
verify the failure is correct
write minimal code
rerun focused tests

6. Verify

Before claiming completion, verify:

- unit tests
CLI simulation tests
help output
capability and config behavior
persistence side effects
INLINECODE40 / content_channel_links / content_relations side effects when update is involved
interact audit behavior when applicable

创作 agent-data-cli 数据源

概述

使用此技能来设计和实现一个行为稳定、项目适配清晰的 agent-data-cli 数据源。

此技能比常规功能开发更为严格，因为薄弱的数据源设计会导致协议偏移、命令混乱以及同步行为不可靠。

这是为 RSS 订阅源、HTTP API、HTML 抓取、浏览器驱动站点、金融数据、新闻内容以及其他必须符合 source/channel/content 模型的远程内容系统提供的数据源创作路径。

需要牢记的当前核心契约：

- source/channel 仍然是唯一的核心资源模型
content update 数据源返回 ContentSyncBatch，而非扁平化的逐行持久化指令
共享持久化现在使用 contentnodes、contentchannellinks 和 contentrelations
核心中的结构关系使用抽象的 parent；数据源特定的含义归属于 relation_semantic

硬性门槛

不要立即开始实施。

必需的顺序是：

1. 研究
规范
计划
审批
实施
验证

如果用户明确想要跳过某个阶段，请在继续之前说明这样做会带来什么风险。

何时使用

当用户想要以下操作时使用此技能：

- 添加新的数据源
重新设计现有数据源
添加数据源能力，如 channel search、content search、content update 或 content interact
添加对 RSS、API、抓取、浏览器自动化、认证、Cookie 或远程副作用功能的支持

在以下情况下不使用此技能：

- 对现有数据源进行常规内容操作
与数据源工作无关的 CLI 或存储更改

从 skills.sh 安装

直接从 skills.sh 安装此技能：

bash
npx skills add https://github.com/severinzhong/agent-data-cli --skill authoring-data-cli-source

安装

如果本地没有 agent-data-cli，请先安装：

bash
git clone https://github.com/severinzhong/agent-data-cli
cd agent-data-cli
uv sync

然后从该仓库的 skills/ 目录加载捆绑的技能，并从仓库根目录开始工作。

重要边界：

- 源代码属于数据源工作区仓库，通常是 agent-data-hub
保持 agent-data-cli 专注于核心/CLI/存储/协议工作
不要使用 uv add 将数据源运行时依赖项安装到核心项目中
改为在数据源工作区内使用 uv pip install 或 init.sh

工作流程

1. 研究

在做出架构决策之前，确定数据源类型。

将其归类为以下一种或多种：

- RSS
API
HTML 抓取
浏览器驱动
认证或会话驱动
可交互

研究必须确认：

- 数据源是否具有真实的 channel 概念
远程发现和远程同步是否不同
如何识别唯一内容
数据源是否具有应成为 content_relations 的层次结构或容器类内容
有哪些时间字段可用
分页或增量获取如何工作
需要哪些配置
交互是否实际可行

在适当的情况下，使用可用的网络研究、本地获取工具以及仓库的 fetchers/ 目录。

2. 规范

在实施之前编写数据源特定的规范。

它必须定义：

- 数据源到资源的映射
支持的能力
配置字段和模式（如果需要）
内容规范化和去重策略
contentkey 策略
更新是仅返回直接内容，还是也返回上下文节点和 contentrelations
数据源是否需要 relationsemantic 值，如 reply、contains 或 listitem
存储要求
错误边界
CLI 可见的语义
测试范围

对于原生搜索/查询视图：

- 将列名视为软兼容性表面，因为多数据源和多通道聚合按列标题合并
优先使用明确的名称，如 published_at、publisher、author、price、volume
避免使用模糊的名称，如 time、source、value，除非其含义确实精确
列顺序主要为了可读性；标题命名决定合并行为

3. 计划

将已批准的规范转化为实施计划。

计划必须将工作分解为：

- 首先要添加的失败测试
要实施的源代码单元
ContentSyncBatch 构建路径
CLI 验证步骤
持久化和审计验证

4. 审批

在规范和计划之后等待用户审批。

不要从研究直接跳到代码。

5. 实施

使用测试驱动开发进行实施。

- 首先编写失败测试
验证失败是正确的
编写最少的代码
重新运行聚焦的测试

6. 验证

在声称完成之前，验证：

- 单元测试
CLI 模拟测试
帮助输出
能力和配置行为
持久化副作用
涉及更新时的 contentnodes / contentchannellinks / contentrelations 副作用
适用时的交互审计行为

继续阅读

- references/source-contract.md 了解仓库规则
references/source-type-rss.md 了解类似订阅源的数据源
references/source-type-api.md 了解 JSON 或 HTTP API 数据源
references/source-type-browser.md 了解浏览器驱动的数据源
references/source-type-interact.md 了解远程副作用
references/source-testing.md 了解测试矩阵
references/source-review-checklist.md 在最终验证前使用