Skill Reviewer

Audit agent skills (SKILL.md files) for quality, correctness, and completeness. Provides a structured review framework with scoring rubric, defect checklists, and improvement recommendations.

When to Use

- Reviewing a skill before publishing to the registry
Evaluating a skill you downloaded from the registry
Auditing your own skills for quality improvements
Comparing skills in the same category
Deciding whether a skill is worth installing

Review Process

Step 1: Structural Check

Verify the skill has the required structure. Read the file and check each item:

CODEBLOCK0

Step 2: Frontmatter Quality

Description field audit

The description is the most impactful field. Evaluate it against these criteria:

CODEBLOCK1

Metadata audit

CODEBLOCK2

Step 3: Content Quality

Example density

Count code blocks and total lines:

CODEBLOCK3

Example quality

For each code block, check:

CODEBLOCK4bash, ``python, etc.) [ ] Command is syntactically correct [ ] Output shown in comments where helpful [ ] Uses realistic values (not foo/bar/baz) [ ] No placeholder values left (TODO, FIXME, xxx) [ ] Self-contained (doesn't depend on undefined variables) OR setup is shown/referenced [ ] Covers the common case (not just edge cases) CODEBLOCK5 ORGANIZATION SCORING: [2] Organized by task/scenario (not by abstract concept) GOOD: "## Encode and Decode" → "## Inspect Characters" → "## Convert Formats" BAD: "## Theory" → "## Types" → "## Advanced" [2] Most common operations come first GOOD: Basic usage → Variations → Advanced → Edge cases BAD: Configuration → Theory → Finally the basic usage [1] Sections are self-contained (can be used independently) [1] Consistent depth (not mixing h2 with h4 randomly) Score: __/6 CODEBLOCK6 PLATFORM CHECKLIST: [ ] macOS differences noted where relevant (sed -i '' vs sed -i, brew vs apt, BSD vs GNU flags) [ ] Linux distro variations noted (apt vs yum vs pacman) [ ] Windows compatibility addressed if os includes "win32" [ ] Tool version assumptions stated (Docker v2 syntax, Python 3.x) CODEBLOCK7 ACTIONABILITY SCORING: [3] Instructions are imperative ("Run X", "Create Y") NOT: "You might consider..." or "It's recommended to..." [3] Steps are ordered logically (prerequisites before actions) [2] Error cases addressed (what to do when something fails) [2] Output/result described (how to verify it worked) Score: __/10 CODEBLOCK8 TIPS SCORING: [2] 5-10 tips present [2] Tips are non-obvious (not "read the documentation") GOOD: "The number one Makefile bug: spaces instead of tabs" BAD: "Make sure to test your code" [2] Tips are specific and actionable GOOD: "Use flock to prevent overlapping cron runs" BAD: "Be careful with concurrent execution" [1] No tips contradict the main content [1] Tips cover gotchas/footguns specific to this topic Score: __/8 CODEBLOCK9 SKILL REVIEW SCORECARD ═══════════════════════════════════════ Skill: [name] Reviewer: [agent/human] Date: [date] Category Score Max ───────────────────────────────────── Structure __ 11 Description __ 8 Metadata __ 4 Example density __ 3* Example quality __ 3* Organization __ 6 Actionability __ 10 Tips __ 8 ───────────────────────────────────── TOTAL __ 53+ * Example density and quality are per-sample, not summed. Use the average across all examples. RATING: 45+ Excellent — publish-ready 35-44 Good — minor improvements needed 25-34 Fair — significant gaps to address < 25 Poor — needs major rework VERDICT: [PUBLISH / REVISE / REWORK] CODEBLOCK10 DEFECT: Invalid frontmatter DETECT: YAML parse error, missing required fields FIX: Validate YAML, ensure name/description/metadata all present DEFECT: Broken code examples DETECT: Syntax errors, undefined variables, wrong flags FIX: Test every command in a clean environment DEFECT: Wrong tool requirements DETECT: metadata.requires lists tools not used in content, or omits tools that are used FIX: Grep content for command names, update requires to match DEFECT: Misleading description DETECT: Description promises coverage the content doesn't deliver FIX: Align description with actual content, or add missing content CODEBLOCK11 DEFECT: No "When to Use" section IMPACT: Agent doesn't know when to activate the skill FIX: Add 4-8 bullet points describing trigger scenarios DEFECT: Text walls without examples DETECT: Any section > 10 lines with no code block FIX: Add concrete examples for every concept described DEFECT: Examples missing language tags DETECT:`without language identifier FIX: Add bash, python, javascript, yaml, etc. to every code fence DEFECT: No Tips section IMPACT: Missing the distilled expertise that makes a skill valuable FIX: Add 5-10 non-obvious, actionable tips DEFECT: Abstract organization DETECT: Sections named "Theory", "Overview", "Background", "Introduction" FIX: Reorganize by task/operation: what the user is trying to DO CODEBLOCK12 DEFECT: Placeholder values DETECT: foo, bar, baz, example.com, 1.2.3.4, TODO, FIXME FIX: Replace with realistic values (myapp, api.example.com, 192.168.1.100) DEFECT: Inconsistent formatting DETECT: Mixed heading levels, inconsistent code block style FIX: Standardize heading hierarchy and formatting DEFECT: Missing cross-references DETECT: Mentions tools/concepts covered by other skills without referencing them FIX: Add "See the X skill for more on Y" notes DEFECT: Outdated commands DETECT: docker-compose (v1), python (not python3), npm -g without npx alternative FIX: Update to current tool versions and syntax CODEBLOCK13 COMPARATIVE CRITERIA: 1. Coverage breadth Which skill covers more use cases? 2. Example quality Which has more runnable, realistic examples? 3. Depth on common operations Which handles the 80% case better? 4. Edge case coverage Which addresses more gotchas and failure modes? 5. Cross-platform support Which works across more environments? 6. Freshness Which uses current tool versions and syntax? WINNER: [skill A / skill B / tie] REASON: [1-2 sentence justification] CODEBLOCK14markdown ## Quick Review: [skill-name] **Structure**: [OK / Issues: ...] **Description**: [Strong / Weak: reason] **Examples**: [X code blocks across Y lines — density OK/low/high] **Actionability**: [Agent can/cannot follow these instructions because...] **Top defect**: [The single most impactful thing to fix] **Verdict**: [PUBLISH / REVISE / REWORK] CODEBLOCK15bash # 1. Validate frontmatter head -20 skills/my-skill/SKILL.md # Visually confirm YAML is valid # 2. Count code blocks grep -c '`' skills/my-skill/SKILL.md # Divide total lines by this number for density # 3. Check for placeholders grep -n -i 'todo\|fixme\|xxx\|foo\|bar\|baz' skills/my-skill/SKILL.md # 4. Check for missing language tags grep -n '^`$' skills/my-skill/SKILL.md # Every code fence should have a language tag — bare`is a defect # 5. Verify tool requirements match content # Extract requires from frontmatter, then grep for each tool in content # 6. Test commands (sample 3-5 from the skill) # Run them in a clean shell to verify they work # 7. Run the scorecard mentally or in a file # Target: 35+ for good, 45+ for excellent CODEBLOCK16bash # Install the skill npx molthub@latest install skill-name # Read it cat skills/skill-name/SKILL.md # Run the quick review template # If score < 25, consider uninstalling and finding an alternative`## Tips - The description field accounts for more real-world impact than all other fields combined. A perfect skill with a bad description will never be found via search. - Count code blocks as your first quality signal. Skills with fewer than 8 code blocks are almost always too abstract to be useful. - Test 3-5 commands from the skill in a clean environment. If more than one fails, the skill wasn't tested before publishing. - "Organized by task" vs. "organized by concept" is the single biggest structural quality differentiator. Good skills answer "how do I do X?" — bad skills explain "what is X?" - A skill with great tips but weak examples is better than one with thorough examples but no tips. Tips encode expertise that examples alone don't convey. - Check therequires.anyBins against what the skill actually uses. A common defect is listing bash (which everything has) instead of the actual tools like docker, curl, or jq`.

- Short skills (< 150 lines) usually aren't worth publishing — they don't provide enough value over a quick web search. If your skill is short, it might be better as a section in a larger skill.
The best skills are ones you'd bookmark yourself. If you wouldn't use it, don't publish it.

技能审查员

对代理技能（SKILL.md文件）进行质量、正确性和完整性审计。提供结构化的审查框架，包含评分标准、缺陷检查清单和改进建议。

使用时机

- 在将技能发布到注册表前进行审查
评估从注册表下载的技能
审计自己的技能以改进质量
比较同一类别的技能
决定某个技能是否值得安装

审查流程

第一步：结构检查

验证技能具备所需结构。读取文件并检查每一项：

结构检查清单：
[ ] 有效的YAML前置元数据（以---开头和结尾）
[ ] name字段存在且为有效的slug（小写、连字符分隔）
[ ] description字段存在且非空
[ ] metadata字段存在且包含有效的JSON
[ ] metadata.clawdbot.emoji为单个表情符号
[ ] metadata.clawdbot.requires.anyBins列出真实的CLI工具
[ ] 标题（# 标题）紧跟在元数据之后
[ ] 标题后的摘要段落
[ ] 使用时机部分存在
[ ] 至少3个主要内容部分
[ ] 末尾存在提示部分

第二步：前置元数据质量

描述字段审计

描述是最具影响力的字段。根据以下标准进行评估：

描述评分：

[2] 以技能功能开头（主动动词）
良好：为任何项目类型编写Makefile。
差劲：本技能涵盖Makefile。
差劲：Make的全面指南。

[2] 包含触发短语（当...时使用）
良好：当设置构建自动化、定义多目标构建时使用
差劲：完全没有触发短语

[2] 具体范围（提及具体工具、语言或操作）
良好：SQLite/PostgreSQL/MySQL — 模式设计、查询、CTE、窗口函数
差劲：数据库相关

[1] 合理长度（50-200字符）
过短：制作东西（无搜索覆盖）
过长：300+字符（会被截断）

[1] 自然包含可搜索关键词
良好：cron作业、systemd定时器、调度
差劲：关键词生硬堆砌

得分：/8

元数据审计

元数据评分：

[1] 表情符号与技能主题相关
[1] requires.anyBins列出技能实际使用的工具（非bash等通用工具）
[1] os数组准确（如果命令仅限Linux，不要声明win32）
[1] JSON有效（用JSON解析器测试）

得分：/4

第三步：内容质量

示例密度

统计代码块和总行数：

示例密度：

行数： _
代码块数： _
比例：每_行1个代码块

目标：每8-15行1个代码块
< 8 行/块：可能过于碎片化

20行/块：需要更多示例

示例质量

对每个代码块进行检查：

示例质量检查清单：

[ ] 指定了语言标签（bash、python等）
[ ] 命令语法正确
[ ] 在注释中显示输出（如有帮助）
[ ] 使用真实值（非foo/bar/baz）
[ ] 没有遗留的占位值（TODO、FIXME、xxx）
[ ] 自包含（不依赖未定义的变量）
或显示/引用了设置步骤
[ ] 覆盖常见情况（不仅仅是边缘情况）

每个示例评分0-3：

- 0：错误或误导
1：可用但过于简单（无输出、无上下文）
2：良好（正确、有输出或解释）
3：优秀（可复制粘贴、真实、覆盖边缘情况）

章节组织

组织评分：

[2] 按任务/场景组织（非按抽象概念）
良好：## 编码与解码 → ## 检查字符 → ## 转换格式
差劲：## 理论 → ## 类型 → ## 高级

[2] 最常用操作放在前面
良好：基本用法 → 变体 → 高级 → 边缘情况
差劲：配置 → 理论 → 最后才是基本用法

[1] 章节自包含（可独立使用）

[1] 深度一致（不随意混用h2和h4）

得分：/6

跨平台准确性

平台检查清单：

[ ] 在相关处注明macOS差异
（sed -i vs sed -i、brew vs apt、BSD vs GNU标志）
[ ] 注明Linux发行版差异（apt vs yum vs pacman）
[ ] 如果os包含win32，说明Windows兼容性
[ ] 说明工具版本假设（Docker v2语法、Python 3.x）

第四步：可操作性评估

核心问题：代理能否遵循这些指令产生正确结果？

可操作性评分：

[3] 指令为祈使句（运行X、创建Y）
非：你可以考虑...或建议...

[3] 步骤逻辑有序（先决条件在操作之前）

[2] 处理错误情况（失败时该怎么做）

[2] 描述输出/结果（如何验证成功）

得分：/10

第五步：提示部分质量

提示评分：

[2] 包含5-10条提示

[2] 提示非显而易见（非阅读文档）
良好：Makefile头号错误：用空格代替制表符
差劲：确保测试你的代码

[2] 提示具体且可操作
良好：使用flock防止重叠的cron运行
差劲：注意并发执行

[1] 提示不与主要内容矛盾

[1] 提示覆盖该主题特有的陷阱/雷区

得分：/8

评分汇总

技能审查评分卡
═══════════════════════════════════════
技能：[名称]
审查员：[代理/人类]
日期：[日期]

类别得分满分
─────────────────────────────────────
结构 11
描述 8
元数据 4
示例密度 3*
示例质量 3*
组织 6
可操作性 10
提示 8
─────────────────────────────────────
总计 53+

* 示例密度和质量按样本计，

非累加。使用所有示例的平均值。

评级：
45+ 优秀 — 可发布
35-44 良好 — 需小幅改进
25-34 一般 — 存在明显差距
< 25 差劲 — 需大幅重做

裁决：[发布 / 修订 / 重做]

常见缺陷

严重（阻止发布）

缺陷：无效的前置元数据
检测：YAML解析错误、缺少必填字段
修复：验证YAML，确保name/description/metadata都存在

缺陷：错误的代码示例
检测：语法错误、未定义变量、错误标志
修复：在干净环境中测试每个命令

缺陷：错误的工具要求
检测：metadata.requires列出内容中未使用的工具，或遗漏了使用的工具
修复：在内容中搜索命令名称，更新requires以匹配

缺陷：误导性描述
检测：描述承诺的内容实际未提供
修复：使描述与实际内容一致，或补充缺失内容

主要（发布前应修复）

缺陷：缺少使用时机部分
影响：代理不知道何时激活该技能
修复：添加4-8个描述触发场景的要点

缺陷：无示例的纯文本段落
检测：任何超过10行且无代码块的部分
修复：为每个描述的概念添加具体示例

缺陷：示例缺少语言标签
检测：后无语言标识符
修复：为每个代码围栏添加bash、python、javascript、yaml等

缺陷：缺少提示部分
影响：缺少使技能有价值的精炼专业知识
修复：添加5-10条非显而易见、可操作的提示

缺陷：抽象的组织方式
检测：章节命名为理论、概述、背景、介绍
修复：按任务/操作重新组织：用户想要做什么

次要（可修复）

缺陷：占位值
检测：foo、bar、baz、example.com、1.2.3.4、TODO、FIXME
修复：替换为真实值（myapp、api.example.com、192.168.1.100）

缺陷：格式不一致
检测：标题级别混用、代码块风格不一致
修复：标准化标题层级和格式

缺陷：缺少交叉引用
检测：提及其他技能涵盖的工具/概念但未引用
修复：添加有关Y的更多信息，请参阅X技能的说明

缺陷：过时的命令
检测：docker-compose（v1）、python（非python3）、npm -g无npx替代方案
修复：更新为当前工具版本和语法

比较审查

比较同一类别的技能时：

比较

skill-reviewer技能审查器

skill-reviewer

Skill Reviewer

When to Use

Review Process

Step 1: Structural Check

Step 2: Frontmatter Quality

Description field audit

Metadata audit

Step 3: Content Quality

Example density

Example quality

技能审查员

使用时机

审查流程

第一步：结构检查

第二步：前置元数据质量

描述字段审计

元数据审计

第三步：内容质量

示例密度

示例质量

章节组织

跨平台准确性

第四步：可操作性评估

第五步：提示部分质量

评分汇总

常见缺陷

严重（阻止发布）

主要（发布前应修复）

次要（可修复）

比较审查

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

skill-reviewer技能审查器

skill-reviewer

Skill Reviewer

When to Use

Review Process

Step 1: Structural Check

Step 2: Frontmatter Quality

Description field audit

Metadata audit

Step 3: Content Quality

Example density

Example quality

技能审查员

使用时机

审查流程

第一步：结构检查

第二步：前置元数据质量

描述字段审计

元数据审计

第三步：内容质量

示例密度

示例质量

章节组织

跨平台准确性

第四步：可操作性评估

第五步：提示部分质量

评分汇总

常见缺陷

严重（阻止发布）

主要（发布前应修复）

次要（可修复）

比较审查

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement