Autoagent Skill

Optimize any agent guidance through automated testing and iterative improvement.

Quick Start

CODEBLOCK0

What It Does

1. Setup Phase - Asks where your guidance lives and what it should do
Creates Sandbox - Copies guidance to test folder with fixtures
Runs Optimization Loop - Every 5 minutes via cron:

- Analyzes current guidance - Proposes improvement - Tests with subagent - Scores result - Keeps or discards change

4. Logs Everything - Check scores.md for history

Setup Phase (Every Invocation Starts Fresh)

Every invocation of /autoagent starts fresh with interactive setup questions.

Step 1: Ask Sandbox Location

Ask the user:

Where should I create the sandbox folder? Default: ../../autoagent-sandbox/ (resolves to /clawd/autoagent-sandbox/)

You can respond with:

- Empty/default: Press enter to use INLINECODE3
Just a name: "news" creates ../../autoagent-news/ → INLINECODE5
Relative path: "agentDev/optimize" creates ../../agentDev/optimize/ → INLINECODE7
Absolute path: /some/other/path/optimize/ → exact path

Wait for their response (or empty for default).

Step 2: Discuss Success Criteria

Ask the user:

Let's define how we'll measure success. What does a "good" result look like for this task?

Follow up one at a time based on their response:

- What specific outputs are expected?
What format should they be in?
What's the minimum viable quality?
Any edge cases to consider?

Once you have enough information, propose a draft scoring.md:

CODEBLOCK1

Wait for user approval or modifications.

Step 3: Ask About External Scripts/Tools

Ask the user:

Does the guidance rely on any scripts, tools, or external software?

- If yes: Note each script/tool path and what functionality it provides
The autoagent should analyze these to recommend improvements

Step 4: Ask Cron Schedule

Ask the user:

Run optimization every 5 minutes (default), or different interval?

Step 5: Create Sandbox

After all questions answered, create the sandbox folder at the user-specified path:

CODEBLOCK2

Step 6: Set Up Cron

Use OpenClaw cron syntax to schedule the iteration agent:

- Default: every 5 minutes (*/5 * * * *)
Command: invoke the iteration prompt with the sandbox path

Step 7: Confirm Start

Return confirmation message showing the resolved path:

"Optimization started at /clawd/autoagent-news/. I'll check back every 5 minutes. Monitor progress in scores.md."

Iteration Phase (Runs Every Cron Interval)

Each time the cron triggers, do the following:

Step 1: Analyze Current State

Read from the sandbox:

- current-guidance.md - The guidance being optimized
INLINECODE13 - History of scores and changes
INLINECODE14 - How to measure success
INLINECODE15 - Test inputs (MUST read this to understand what the guidance is being tested against)

Review score history (last 10 runs or all available runs if fewer than 10 exist), identify patterns, note current score. When fewer than 10 runs exist, treat all available scores as the set for plateau detection.

Important: Load the test cases from fixtures/test-cases.json to understand what specific outputs/ behaviors are expected. The edit should address gaps revealed by test case failures or missing criteria.

Step 1b: Analyze External Scripts/Tools (If Applicable)

If the guidance references any scripts, tools, or external software:

1. Locate each script/tool - Find the actual script files or binary locations
Analyze the functionality - Read the code or documentation to understand what it does
Identify improvement opportunities:

- For open-source scripts: Can the script be modified to improve functionality? - For closed-source/compiled tools: Can wrapper behavior be improved? Can you recommend API/interface changes?

4. Note findings in the iteration - If script improvements could help test scores, document them

Example outputs:

- "Script X does Y but could do Z - recommend modification to add feature W"
"Tool A is closed-source, recommend changing prompt to work around limitation B"
"Script C has bug in function D - fix would improve test outcomes"

Step 2: Propose Edit

Generate ONE specific edit to the guidance that might improve the score.

Analyze Score History First:

- Read scores.md to find the last 10 runs
Identify patterns: Which scoring criteria are consistently low?
Look for repeated failures - if the same criterion failed multiple times, that's your target
Check what changes were tried before (avoid repeating failed approaches)

Edit Selection Strategy (Priority Order):

1. If scores exist: Target the lowest-scoring criteria from scoring.md
If all scores high (90+): Add missing detail to any criteria marked as partial
If only 1-2 runs: Assume baseline covered basics, add missing methodology
Prioritize edits that affect multiple scoring criteria at once

The edit should:

- Be specific and actionable (not vague like "improve clarity")
Address a weakness identified in scoring (target the lowest-scoring criteria)
Not be identical to recently tried changes (check scores.md for recent descriptions)
Include the exact text to add/remove/replace

Format:

## Proposed Edit

**Rationale:** Why this change might help

**Change:**

[Show exact diff or new text]
CODEBLOCK4

Step 3: Apply Edit

Write the edited guidance to INLINECODE17

Step 4: Run Test

Use a subagent to run the task with the new guidance:

- Give the subagent INLINECODE18
Provide test inputs from INLINECODE19
Capture the output
Subagent invocation: Use sessions_spawn with task containing the full contents of current-guidance.md, include the test cases JSON inline in the task prompt, set timeoutSeconds to 120, and request the subagent to return the raw output (not just pass/fail)

Step 5: Score Result

Evaluate the output against scoring.md criteria.
Generate a score 0-100.

Step 6: Log Decision

Append to scores.md:

CODEBLOCK5

Where N is the run number (increment from last).

Step 7: Update Guidance

- If score improved: Keep the edit (current-guidance.md is already updated)
If score declined: Revert current-guidance.md to previous version

Step 8: Check Plateau

If last 10 scores are within 5 points of each other:

- Log "Plateau detected - pausing"
Notify user
Stop the cron (or pause and await user override)

Files Created in Sandbox

File	Description
INLINECODE27	Original copy (read-only reference)
INLINECODE28

Usage

1. Invoke: INLINECODE33
Answer setup questions
Monitor scores.md for progress
Copy improvements to original when satisfied
Stop cron when done

Stopping

- User can stop cron anytime
Auto-stops if score plateaus for 10 runs
Check scores.md for progress

Key Principles

- Non-destructive: Original guidance stays in INLINECODE36
Learn from history: Don't repeat failed approaches
Be specific: Vague changes won't score well
Human in the loop: User defines success criteria, can override plateau detection

Autoagent 技能

通过自动化测试和迭代改进来优化任何智能体指导。

快速开始

/autoagent

功能说明

1. 设置阶段 - 询问指导文件位置及其功能目标
创建沙盒 - 将指导文件复制到包含测试夹具的测试文件夹
运行优化循环 - 通过cron每5分钟执行一次：

- 分析当前指导 - 提出改进方案 - 使用子智能体进行测试 - 对结果进行评分 - 保留或丢弃更改

4. 记录所有内容 - 查看scores.md获取历史记录

设置阶段（每次调用重新开始）

每次调用/autoagent都会通过交互式设置问题重新开始。

步骤1：询问沙盒位置

询问用户：

我应该在哪里创建沙盒文件夹？默认：../../autoagent-sandbox/（解析为/clawd/autoagent-sandbox/）

您可以回复：

- 空/默认：按回车使用../../autoagent-sandbox/
仅名称：news创建../../autoagent-news/ → /clawd/autoagent-news/
相对路径：agentDev/optimize创建../../agentDev/optimize/ → /clawd/agentDev/optimize/
绝对路径：/some/other/path/optimize/ → 精确路径

等待用户响应（或空值使用默认）。

步骤2：讨论成功标准

询问用户：

让我们定义如何衡量成功。对于这个任务，好的结果是什么样的？

根据用户的回复逐一跟进：

- 期望的具体输出是什么？
应该采用什么格式？
最低可接受质量是什么？
需要考虑哪些边界情况？

一旦获得足够信息，提出评分草案scoring.md：

markdown

建议评分标准

评分组成部分：

- [组成部分1]：[X]分 - [描述]
[组成部分2]：[Y]分 - [描述]
...

总分： 100分

[任何附加说明]

等待用户批准或修改。

步骤3：询问外部脚本/工具

询问用户：

指导是否依赖于任何脚本、工具或外部软件？

- 如果是：记录每个脚本/工具路径及其提供的功能
autoagent应分析这些内容以推荐改进

步骤4：询问Cron计划

询问用户：

每5分钟运行一次优化（默认），还是不同的间隔？

步骤5：创建沙盒

所有问题回答完毕后，在用户指定的路径创建沙盒文件夹：

sandbox/
├── guidance-under-test.md # 原始指导副本
├── current-guidance.md # 初始与guidance-under-test相同
├── fixtures/
│ └── test-cases.json # {cases: [{input: ..., expected: ...}]}
├── scoring.md # 评分标准文档（用户批准）
├── scores.md # 评分历史表
└── scripts/ # （可选）引用的脚本/工具副本

步骤6：设置Cron

使用OpenClaw cron语法安排迭代智能体：

- 默认：每5分钟（/5 *）
命令：使用沙盒路径调用迭代提示

步骤7：确认启动

返回确认消息，显示解析后的路径：

优化已在/clawd/autoagent-news/启动。我将每5分钟检查一次。在scores.md中监控进度。

迭代阶段（每个Cron间隔执行）

每次cron触发时，执行以下操作：

步骤1：分析当前状态

从沙盒中读取：

- current-guidance.md - 正在优化的指导
scores.md - 评分和更改历史
scoring.md - 如何衡量成功
fixtures/test-cases.json - 测试输入（必须读取以了解指导正在针对什么进行测试）

审查评分历史（最近10次运行，如果少于10次则全部可用），识别模式，记录当前评分。当运行次数少于10次时，将所有可用评分视为平台期检测的集合。

重要： 从fixtures/test-cases.json加载测试用例，以了解期望的具体输出/行为。编辑应解决测试用例失败或缺失标准所揭示的差距。

步骤1b：分析外部脚本/工具（如适用）

如果指导引用了任何脚本、工具或外部软件：

1. 定位每个脚本/工具 - 找到实际的脚本文件或二进制位置
分析功能 - 阅读代码或文档以了解其功能
识别改进机会：

- 对于开源脚本： 是否可以修改脚本以改进功能？ - 对于闭源/编译工具： 是否可以改进包装器行为？是否可以推荐API/接口更改？

4. 在迭代中记录发现 - 如果脚本改进有助于测试评分，请记录

示例输出：

- 脚本X执行Y但可以执行Z - 建议修改以添加功能W
工具A是闭源的，建议更改提示以绕过限制B
脚本C在函数D中存在错误 - 修复将改善测试结果

步骤2：提出编辑

生成一个针对指导的具体编辑，可能提高评分。

首先分析评分历史：

- 读取scores.md找到最近10次运行
识别模式：哪些评分标准持续偏低？
查找重复失败 - 如果同一标准多次失败，那就是您的目标
检查之前尝试过的更改（避免重复失败的方法）

编辑选择策略（优先级顺序）：

1. 如果存在评分：针对scoring.md中评分最低的标准
如果所有评分都很高（90+）：向任何标记为部分的标准添加缺失的细节
如果只有1-2次运行：假设基线已覆盖基础，添加缺失的方法论
优先考虑同时影响多个评分标准的编辑

编辑应：

- 具体且可操作（不是模糊的提高清晰度）
针对评分中识别的弱点（针对评分最低的标准）
不与最近尝试的更改相同（检查scores.md中的最近描述）
包含要添加/删除/替换的确切文本

格式：
markdown

建议编辑

理由： 为什么此更改可能有帮助

更改：

[显示确切的差异或新文本]

步骤3：应用编辑

将编辑后的指导写入current-guidance.md

步骤4：运行测试

使用子智能体运行带有新指导的任务：

- 给子智能体current-guidance.md
提供来自fixtures/test-cases.json的测试输入
捕获输出
子智能体调用： 使用sessions_spawn，task包含current-guidance.md的完整内容，在任务提示中内联包含测试用例JSON，设置timeoutSeconds为120，要求子智能体返回原始输出（不仅仅是通过/失败）

步骤5：评分结果

根据scoring.md标准评估输出。
生成0-100的评分。

步骤6：记录决策

追加到scores.md：

其中N是运行编号（从上一次递增）。

步骤7：更新指导

- 如果评分提高：保留编辑（current-guidance.md已更新）
如果评分下降：将current-guidance.md恢复为上一版本

步骤8：检查平台期

如果最近10个评分彼此相差在5分以内：

- 记录检测到平台期 - 暂停
通知用户
停止cron（或暂停并等待用户覆盖）

沙盒中创建的文件

文件	描述
guidance-under-test.md	原始副本（只读参考）
current-guidance.md

使用方法

1. 调用：/autoagent
回答设置问题
监控scores.md了解进度
满意后将改进复制到原始文件
完成后停止cron

停止

- 用户可以随时停止cron
如果评分在10次运行中达到平台期则自动停止
查看scores.md了解进度

关键原则

- 非破坏性：原始指导保留在guidance-under-test.md中
从历史中学习：不重复失败的方法
具体明确：模糊的更改不会获得高分
人在循环中：

autoagent自动代理优化

autoagent

Autoagent Skill

Quick Start

What It Does

Setup Phase (Every Invocation Starts Fresh)

Step 1: Ask Sandbox Location

Step 2: Discuss Success Criteria

Step 3: Ask About External Scripts/Tools

Step 4: Ask Cron Schedule

Step 5: Create Sandbox

Step 6: Set Up Cron

Step 7: Confirm Start

Iteration Phase (Runs Every Cron Interval)

Step 1: Analyze Current State

Step 1b: Analyze External Scripts/Tools (If Applicable)

Step 2: Propose Edit

Step 3: Apply Edit

Step 4: Run Test

Step 5: Score Result

Step 6: Log Decision

Step 7: Update Guidance

Step 8: Check Plateau

Files Created in Sandbox

Usage

Stopping

Key Principles

Autoagent 技能

快速开始

功能说明

设置阶段（每次调用重新开始）

步骤1：询问沙盒位置

步骤2：讨论成功标准

建议评分标准

步骤3：询问外部脚本/工具

步骤4：询问Cron计划

步骤5：创建沙盒

步骤6：设置Cron

步骤7：确认启动

迭代阶段（每个Cron间隔执行）

步骤1：分析当前状态

步骤1b：分析外部脚本/工具（如适用）

步骤2：提出编辑

建议编辑

步骤3：应用编辑

步骤4：运行测试

步骤5：评分结果

步骤6：记录决策

步骤7：更新指导

步骤8：检查平台期

沙盒中创建的文件

使用方法

停止

关键原则

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement