CLI Design Framework

Overview

Design and review CLIs with a classification-first framework.

Treat this as a decision system, not a generic style guide. Do not assume every CLI should become agent-first, machine-protocol-first, or raw-payload-first.

When to Use

Use this skill when:

- designing a new CLI and the right command shape is not obvious
reviewing an existing CLI whose help, output, or command tree feels mismatched
deciding whether a CLI is primarily Capability, Runtime, Environment / Workspace, Workflow, Package / Build, or Meta
deciding whether human-readable and machine-readable surfaces are primary or secondary
deciding whether session semantics are justified or over-engineered

Do not use this skill when:

- the CLI classification is already settled and you only need implementation mechanics
the question is only about parser libraries, repository layout, or exact flag spelling
the task is purely cosmetic copy editing with no design consequence

Quick Path

- For quick asks, produce a compressed pass: purpose, classification, short reasoning, top design consequences, and only the unresolved questions that could change the answer.
Use the full blueprint or full review template only when the user asks for it explicitly, or when ambiguity or risk justifies the longer form.

Core rule

Classify first. Design second. Review third.

Always work in this order:

1. State the CLI purpose in one sentence.
Classify the primary role/control surface.
Classify the primary user type.
Classify the primary interaction form.
Classify statefulness.
Classify risk profile.
Identify secondary surfaces explicitly.
Derive design consequences.

Start with these files when using the framework:

- references/taxonomy.md for the taxonomy.
INLINECODE1 for the required output shape.

Pull these only when needed:

- references/classification-examples.md for classification anchors when the category is ambiguous.
INLINECODE3 and examples/review-example.md when you need a concrete example of final form.
INLINECODE5 when the design smells wrong but the category mistake is not yet crisp.

Operating modes

Operate in one of two modes:

1. Design mode — create or refine a CLI design direction.
Review mode — evaluate an existing CLI against the framework.

Design mode

Goal

Clarify the CLI's design target, then produce a blueprint that constrains implementation.

Workflow

1. Infer what is already known.

- Extract every strong signal from the user's request. - Infer likely role, user type, interaction form, statefulness, risk profile, and secondary surfaces whenever possible. - Do not ask for facts that are already strongly implied.

2. Ask only the highest-leverage unresolved questions.

- Ask the smallest set of questions that could materially change the classification or the blueprint. - Prefer classification questions over implementation trivia. - Prioritize: purpose → control surface → primary user → interaction form → statefulness → side effects → secondary surfaces. - If the current information is already sufficient, do not ask questions. Produce the blueprint directly.

3. Classify the CLI explicitly.

- State the inferred or confirmed: - primary role/control-surface type - primary user type - primary interaction form - statefulness - risk profile - secondary surfaces - State confidence when inference is uncertain. - Use explicit primary-vs-secondary wording. Do not blur them together.

4. State the design stance before proposing commands.

- Write one short paragraph that says what the CLI is optimizing for. - State what the CLI is not trying to be. - Do not jump straight from classification to command trees.

5. Produce a design blueprint.

- Use the structure in references/output-templates.md. - Use the full template when the user wants a blueprint or when the ambiguity/risk warrants it. - For quick requests, compress to: purpose, classification, classification reasoning, design stance, top design consequences, and only the unresolved questions that matter. - Connect classification directly to design consequences. - Keep the blueprint concrete, not generic.

6. Constrain downstream implementation.

- End with a short direction section that states: - what to optimize for - what not to optimize for - acceptable patterns - category mistakes - v1 boundaries and non-goals

Required design discipline

For every blueprint, enforce these rules:

- Primary vs secondary surfaces

- Name the primary surface explicitly. - Name secondary surfaces explicitly. - State what each surface is for. - Do not describe JSON, event streams, templates, raw payloads, or TUI as “important” without saying whether they are primary or secondary.

- Human-primary / balanced discoverability

- If the CLI is human-primary or balanced, explicitly cover: - help quality - examples - discoverability - explain/describe surfaces when appropriate - Do not discuss only command structure and ignore learnability.

- Structured machine contract

- If the CLI has a machine-readable surface, explicitly state: - which commands expose it - output format (--json, --jsonl, etc.) - whether field names are stable - whether exit codes matter - whether schema / fields / describe support is needed - Do not call a surface “script-friendly” unless the contract is described.

- Risk ladder

- If the CLI mutates state, define at least: - low-risk operations - medium-risk operations - high-risk operations - State the expected guardrails for each level. - Do not stop at “be careful” or “add confirmations.”

- State model

- If the CLI is sessionful, long-running, or attach/detach capable, describe session identity and lifecycle explicitly. - If it is mostly stateless, say so explicitly and avoid inventing session semantics.

- v1 boundaries

- State what v1 should include. - State what v1 should defer. - State what would be premature abstraction.

Question policy

Ask only questions that affect classification or the blueprint.

Do not begin with implementation-detail questions such as:

- language choice
parsing library
repository layout
naming bikesheds
exact flag spelling

Ask those only if they materially affect the CLI's classification or design consequences.

Review mode

Goal

Inspect the CLI and its source, reverse-infer its design intent, then review it in two layers:

1. Classification fit — Is it designed like the right kind of CLI?
Execution quality — Given that type, how well is it executed?

Workflow

1. Inspect before asking.

- Inspect help output, subcommand help, docs, examples, parser code, output code, error handling, state/session code, config surfaces, and tests. - Prefer direct evidence over speculation.

2. Reverse-infer the design intent.

- Infer: - apparent purpose - likely primary role/control-surface type - likely primary user type - likely interaction form - likely statefulness - likely risk profile - existing secondary surfaces

3. Confirm only what cannot be inferred reliably.

- Ask focused confirmation questions only when the answer could materially change the classification or review. - Do not ask the user to restate facts already evident from the CLI or code.

4. Review in two layers.

- Keep classification fit and execution quality separate. - Do not criticize a human-primary CLI for not being agent-primary unless the user explicitly wants that shift.

5. Produce a structured review.

- Use the review structure in references/output-templates.md. - Use the full template when the user wants a formal review or when the category tension is material. - For quick requests, compress to: inferred intent, classification, evidence-backed category mistakes, in-category weaknesses, and highest-priority improvements. - Separate category mistakes from in-category execution weaknesses.

Required review checks

When reviewing, explicitly check these areas when relevant:

- Primary vs secondary surface clarity

- Is the CLI clear about what the main surface is? - Are secondary surfaces real contracts or just informal add-ons?

- Discoverability

- Does help output support the claimed user type? - Are examples, option descriptions, and command structure aligned with the CLI's center of gravity?

- Structured output contract

- Are JSON / JSONL / field-selection / exit-code surfaces explicit and stable? - Are unknown fields rejected or silently tolerated? - Is the machine surface strong enough for the claims made in docs?

- Risk model

- Are low-, medium-, and high-risk actions meaningfully separated? - Are confirm / dry-run / preview / audit guardrails aligned with the risk profile?

- State model

- Is statefulness handled correctly? - Are attach/detach/resume/session/history concepts used only when justified?

- v1 discipline

- Does the CLI keep a coherent v1 boundary? - Does it introduce premature abstraction or missing contracts?

Review rules

- Do not grade every CLI on an agent-first curve.
Do not require raw payloads, full schema introspection, or machine-first output unless the classification justifies them.
Treat modern CLIs as multi-surface systems: one primary role, one primary interaction form, optional secondary surfaces.
Prefer strong inference, then targeted confirmation.
When criticizing machine support, specify whether the problem is:

- missing primary/secondary surface clarity, - weak machine contract, - or a true category mismatch.

Handling hybrid CLIs

Some CLIs genuinely straddle multiple roles at the subcommand level.

Rules for hybrid CLIs:

1. Classify at the product level first — what is the CLI's center of gravity?
If subcommands clearly split into different roles, note the split explicitly.
Name the primary role (the one that defines the CLI's identity and design constraints).
Name secondary roles as secondary surfaces with their own local constraints.
Do not force a single role on a CLI whose subcommands genuinely serve different roles.

Example: Docker

- Product-level primary role: Runtime (its center of gravity is container execution).
INLINECODE10, docker exec, docker attach → Runtime interaction.
INLINECODE13, docker volume inspect → Capability-like resource surfaces (secondary).
INLINECODE15 → Workflow/Orchestration (secondary).

Guidance for evolving CLIs:

- If a CLI is migrating from one type to another, state the current center of gravity and the intended direction.
Do not classify based on the future target alone; classify based on current evidence and note the trajectory.

Common failure modes

Watch for these mistakes:

- Treating every CLI as a capability CLI.
Treating every CLI as a runtime CLI.
Treating TUI or REPL as a role instead of an interaction form.
Ignoring statefulness.
Ignoring risk profile.
Ignoring help/discoverability for human-primary CLIs.
Treating automation fitness as a top-level identity instead of a design consequence.
Forcing human-primary tools into agent-only patterns.
Calling a JSON surface “strong” without defining the contract.
Ignoring secondary surfaces in mixed-mode CLIs.
Jumping from classification directly to command trees without stating design stance.
Failing to mark v1 boundaries and non-goals.
Forcing a single role on a hybrid CLI whose subcommands genuinely serve different control surfaces.
Over-engineering statefulness for a CLI that only has durable config/lockfile side-effects but no true sessions.
Classifying a CLI by its future aspirations instead of its current evidence.

Output bar

Keep final outputs:

- explicit about classification
explicit about classification reasoning when there is tension or ambiguity
explicit about evidence, confidence, and assumptions
explicit about design consequences
explicit about primary vs secondary surfaces
explicit about discoverability and machine contracts when relevant
explicit about risk ladders when mutations exist
scaled to the user's requested depth
concise but dense
diagnostic rather than generic

Avoid vague advice such as "improve UX" or "make it more agent-friendly" unless tied to a specific classification and a concrete design consequence.

CLI 设计框架

概述

采用分类优先的框架来设计和审查CLI。

将此视为一个决策系统，而非通用风格指南。不要假设每个CLI都应成为代理优先、机器协议优先或原始负载优先。

何时使用

在以下情况下使用此技能：

- 设计一个新的CLI，且正确的命令形态尚不明确
审查一个现有的CLI，其帮助信息、输出或命令树感觉不匹配
决定一个CLI主要是能力型、运行时型、环境/工作区型、工作流型、包/构建型还是元型
决定人类可读和机器可读界面是主要的还是次要的
决定会话语义是合理的还是过度设计的

在以下情况下不要使用此技能：

- CLI分类已经确定，你只需要实现机制
问题仅涉及解析库、仓库布局或确切的标志拼写
任务纯粹是外观性的文案编辑，没有设计影响

快速路径

- 对于快速请求，生成一个压缩版本：目的、分类、简短推理、主要设计后果，以及仅那些可能改变答案的未解决问题。
仅在用户明确要求时，或者当歧义或风险需要更长的形式时，使用完整的蓝图或完整的审查模板。

核心规则

先分类。再设计。后审查。

始终按此顺序工作：

1. 用一句话说明CLI的目的。
分类主要角色/控制界面。
分类主要用户类型。
分类主要交互形式。
分类状态性。
分类风险概况。
明确识别次要界面。
推导设计后果。

使用框架时，从这些文件开始：

- references/taxonomy.md 用于分类法。
references/output-templates.md 用于所需的输出格式。

仅在需要时拉取这些文件：

- references/classification-examples.md 用于当类别不明确时的分类锚点。
examples/design-blueprint-example.md 和 examples/review-example.md 当你需要最终形式的具体示例时。
examples/anti-patterns.md 当设计感觉不对但类别错误尚不清晰时。

操作模式

在两种模式之一中操作：

1. 设计模式 — 创建或完善CLI设计方向。
审查模式 — 根据框架评估现有CLI。

设计模式

目标

明确CLI的设计目标，然后生成一个约束实现的蓝图。

工作流程

1. 推断已知信息。

- 从用户的请求中提取每一个强信号。 - 尽可能推断可能的角色、用户类型、交互形式、状态性、风险概况和次要界面。 - 不要询问已经强烈暗示的事实。

2. 仅询问最具杠杆作用的未解决问题。

- 提出可能实质性改变分类或蓝图的最小问题集。 - 优先选择分类问题而非实现细节。 - 优先级：目的 → 控制界面 → 主要用户 → 交互形式 → 状态性 → 副作用 → 次要界面。 - 如果当前信息已经足够，不要提问。直接生成蓝图。

3. 明确分类CLI。

- 说明推断或确认的： - 主要角色/控制界面类型 - 主要用户类型 - 主要交互形式 - 状态性 - 风险概况 - 次要界面 - 当推断不确定时说明置信度。 - 使用明确的主要与次要措辞。不要将它们混为一谈。

4. 在提出命令之前说明设计立场。

- 写一段简短的段落，说明CLI优化的是什么。 - 说明CLI不试图成为什么。 - 不要直接从分类跳到命令树。

5. 生成设计蓝图。

- 使用 references/output-templates.md 中的结构。 - 当用户想要蓝图或当歧义/风险需要时，使用完整模板。 - 对于快速请求，压缩为：目的、分类、分类推理、设计立场、主要设计后果，以及仅那些重要的未解决问题。 - 将分类直接连接到设计后果。 - 保持蓝图具体，而非通用。

6. 约束下游实现。

- 以一个简短的指导部分结束，说明： - 优化什么 - 不优化什么 - 可接受的模式 - 类别错误 - v1边界和非目标

必需的设计纪律

对于每个蓝图，强制执行这些规则：

- 主要与次要界面

- 明确命名主要界面。 - 明确命名次要界面。 - 说明每个界面的用途。 - 不要将JSON、事件流、模板、原始负载或TUI描述为“重要”而不说明它们是主要的还是次要的。

- 人类主要/平衡的可发现性

- 如果CLI是人类主要或平衡的，明确涵盖： - 帮助质量 - 示例 - 可发现性 - 适当时提供解释/描述界面 - 不要只讨论命令结构而忽略可学习性。

- 结构化机器契约

- 如果CLI有机器可读界面，明确说明： - 哪些命令暴露它 - 输出格式（--json、--jsonl等） - 字段名称是否稳定 - 退出代码是否重要 - 是否需要模式/字段/描述支持 - 除非描述了契约，否则不要称界面为“脚本友好”。

- 风险阶梯

- 如果CLI改变状态，至少定义： - 低风险操作 - 中风险操作 - 高风险操作 - 说明每个级别预期的防护措施。 - 不要停留在“小心”或“添加确认”。

- 状态模型

- 如果CLI是有会话的、长时间运行的或支持附加/分离的，明确描述会话身份和生命周期。 - 如果它主要是无状态的，明确说明并避免发明会话语义。

- v1边界

- 说明v1应包含什么。 - 说明v1应推迟什么。 - 说明什么是过早的抽象。

提问策略

只询问影响分类或蓝图的问题。

不要以实现细节问题开始，例如：

- 语言选择
解析库
仓库布局
命名争议
确切的标志拼写

仅当它们实质性影响CLI的分类或设计后果时才询问这些问题。

审查模式

目标

检查CLI及其源代码，反向推断其设计意图，然后在两个层面进行审查：

1. 分类匹配度 — 它是否被设计成正确类型的CLI？
执行质量 — 给定该类型，它的执行效果如何？

工作流程

1. 在询问之前进行检查。

- 检查帮助输出、子命令帮助、文档、示例、解析器代码、输出代码、错误处理、状态/会话代码、配置界面和测试。 - 优先选择直接证据而非推测。

2. 反向推断设计意图。

- 推断： - 明显的目的 - 可能的主要角色/控制界面类型 - 可能的主要用户类型 - 可能的交互形式 - 可能的状态性 - 可能的风险概况 - 现有的次要界面

3. 仅确认无法可靠推断的内容。

- 仅当答案可能实质性改变分类或审查时，才提出有针对性的确认问题。 - 不要要求用户重申从CLI或代码中已经明显的事实。

4. 在两个层面进行审查。

- 保持分类匹配度和执行质量分开。 - 不要批评一个人类主要的CLI不是代理主要的，除非用户明确希望这种转变。

5. 生成结构化审查。

- 使用 references/output-templates.md 中的审查结构。 - 当用户想要正式审查或当类别张力很重要时，使用完整模板。 - 对于快速请求，压缩为：推断的意图、分类、有证据支持的类别错误、类别内的弱点，以及最高优先级的改进。 - 将类别错误与类别内的执行弱点分开。

必需的审查检查

在审查时，在相关时明确检查这些领域：

- 主要与次要界面的清晰度

- CLI是否清楚其主要界面是什么？ - 次要界面是真正的契约还是仅仅是非正式的附加功能？

- 可发现性

- 帮助输出是否支持声称的用户类型？ - 示例、选项描述和命令结构是否与CLI的重心一致？

- 结构化输出契约

- JSON/JSONL/字段选择/退出代码界面是否明确且稳定？ - 未知字段是被拒绝还是被静默容忍？ - 机器界面是否足够强大以支持文档中的声明？

- 风险模型

- 低、中、高风险操作是否有意义地分离？ - 确认/试运行/预览/审计防护措施是否与风险概况一致？

- 状态模型

- 状态性是否被正确处理？ - 附加/分离/恢复/会话/历史概念是否仅在合理时使用？

- v1纪律

- CLI是否保持连贯的v1边界？

cli-design-frameworkCLI设计框架