Skill Evaluator

Evaluate skills across 25 criteria using a hybrid automated + manual approach.

Quick Start

1. Run automated checks

CODEBLOCK0

Checks: file structure, frontmatter, description quality, script syntax, dependency audit, credential scan, env var documentation.

2. Manual assessment

Use the rubric at references/rubric.md to score 25 criteria across 8 categories (0–4 each, 100 total). Each criterion has concrete descriptions per score level.

3. Write the evaluation

Copy assets/EVAL-TEMPLATE.md to the skill directory as EVAL.md. Fill in automated results + manual scores.

Evaluation Process

1. Run eval-skill.py — get the automated structural score
Read the skill's SKILL.md — understand what it does
Read/skim the scripts — assess code quality, error handling, testability
Score each manual criterion using references/rubric.md — concrete criteria per level
Prioritize findings as P0 (blocks publishing) / P1 (should fix) / P2 (nice to have)
Write EVAL.md in the skill directory with scores + findings

Categories (8 categories, 25 criteria)

#	Category	Source Framework	Criteria
1	Functional Suitability	ISO 25010	Completeness, Correctness, Appropriateness
2

Interpreting Scores

Range	Verdict	Action
90–100	Excellent	Publish confidently
80–89

Deeper Security Scanning

This evaluator covers security basics (credentials, input validation, data safety) but for thorough security audits of skills under development, consider SkillLens (npx skilllens scan <path>). It checks for exfiltration, code execution, persistence, privilege bypass, and prompt injection — complementary to the quality focus here.

Dependencies

- Python 3.6+ (for eval-skill.py)
PyYAML (pip install pyyaml) — for frontmatter parsing in automated checks

技能评估器

使用混合自动化+手动方法，对技能进行25项标准的评估。

快速开始

1. 运行自动化检查

bash
python3 scripts/eval-skill.py /path/to/skill
python3 scripts/eval-skill.py /path/to/skill --json # 机器可读格式
python3 scripts/eval-skill.py /path/to/skill --verbose # 显示所有详情

检查项：文件结构、前置元数据、描述质量、脚本语法、依赖审计、凭据扫描、环境变量文档。

2. 手动评估

使用 references/rubric.md 中的评分标准，对8个类别的25项标准进行评分（每项0-4分，总分100分）。每个评分等级都有具体的描述说明。

3. 撰写评估报告

将 assets/EVAL-TEMPLATE.md 复制到技能目录中，命名为 EVAL.md。填入自动化检查结果和手动评分。

评估流程

1. 运行 eval-skill.py — 获取自动化结构评分
阅读技能的 SKILL.md — 了解其功能
阅读/浏览脚本 — 评估代码质量、错误处理、可测试性
使用 references/rubric.md 对每项手动标准进行评分 — 每个级别都有具体标准
将发现的问题按优先级分类：P0（阻止发布）/ P1（应修复）/ P2（锦上添花）
在技能目录中编写 EVAL.md，包含评分和发现的问题

类别（8个类别，25项标准）

#	类别	来源框架	标准
1	功能适用性	ISO 25010	完整性、正确性、适当性
2

可靠性 | ISO 25010 | 容错性、错误报告、可恢复性 | | 3 | 性能/上下文 | ISO 25010 + 智能体 | Token成本、执行效率 | | 4 | 可用性 — AI智能体 | Shneiderman, Gerhardt-Powals | 可学习性、一致性、反馈、防错 | | 5 | 可用性 — 人类 | Tognazzini, Norman | 可发现性、容错性 | | 6 | 安全性 | ISO 25010 + OpenSSF | 凭据、输入验证、数据安全 | | 7 | 可维护性 | ISO 25010 | 模块化、可修改性、可测试性 | | 8 | 智能体特定 | 新型 | 触发精度、渐进式披露、可组合性、幂等性、逃生舱 |

评分解读

分数范围	评定	操作
90–100	优秀	可放心发布
80–89

良好 | 可发布，注明已知问题 | | 70–79 | 可接受 | 发布前修复P0问题 | | 60–69 | 需要改进 | 发布前修复P0+P1问题 | | <60 | 未就绪 | 需要重大返工 |

深度安全扫描

本评估器涵盖基础安全项（凭据、输入验证、数据安全），但对于开发中技能的全面安全审计，建议使用 SkillLens（npx skilllens scan ）。它可检查数据外泄、代码执行、持久化、权限绕过和提示注入——与本评估器的质量检查形成互补。

依赖项

- Python 3.6+（用于 eval-skill.py）
PyYAML（pip install pyyaml）— 用于自动化检查中的前置元数据解析

skill-evaluator技能评估器

skill-evaluator

Skill Evaluator

Quick Start

1. Run automated checks

2. Manual assessment

3. Write the evaluation

Evaluation Process

Categories (8 categories, 25 criteria)

Interpreting Scores

Deeper Security Scanning

Dependencies

技能评估器

快速开始

1. 运行自动化检查

2. 手动评估

3. 撰写评估报告

评估流程

类别（8个类别，25项标准）

评分解读

深度安全扫描

依赖项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

skill-evaluator技能评估器

skill-evaluator

Skill Evaluator

Quick Start

1. Run automated checks

2. Manual assessment

3. Write the evaluation

Evaluation Process

Categories (8 categories, 25 criteria)

Interpreting Scores

Deeper Security Scanning

Dependencies

技能评估器

快速开始

1. 运行自动化检查

2. 手动评估

3. 撰写评估报告

评估流程

类别（8个类别，25项标准）

评分解读

深度安全扫描

依赖项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement