Autoagent Skill
Optimize any agent guidance through automated testing and iterative improvement.
Quick Start
CODEBLOCK0
What It Does
- 1. Setup Phase - Asks where your guidance lives and what it should do
- Creates Sandbox - Copies guidance to test folder with fixtures
- Runs Optimization Loop - Every 5 minutes via cron:
- Analyzes current guidance
- Proposes improvement
- Tests with subagent
- Scores result
- Keeps or discards change
- 4. Logs Everything - Check scores.md for history
Setup Phase (Every Invocation Starts Fresh)
Every invocation of /autoagent starts fresh with interactive setup questions.
Step 1: Ask Sandbox Location
Ask the user:
Where should I create the sandbox folder? Default: ../../autoagent-sandbox/ (resolves to /clawd/autoagent-sandbox/)
You can respond with:
- - Empty/default: Press enter to use INLINECODE3
- Just a name: "news" creates
../../autoagent-news/ → INLINECODE5 - Relative path: "agentDev/optimize" creates
../../agentDev/optimize/ → INLINECODE7 - Absolute path:
/some/other/path/optimize/ → exact path
Wait for their response (or empty for default).
Step 2: Discuss Success Criteria
Ask the user:
Let's define how we'll measure success. What does a "good" result look like for this task?
Follow up one at a time based on their response:
- - What specific outputs are expected?
- What format should they be in?
- What's the minimum viable quality?
- Any edge cases to consider?
Once you have enough information, propose a draft scoring.md:
CODEBLOCK1
Wait for user approval or modifications.
Step 3: Ask About External Scripts/Tools
Ask the user:
Does the guidance rely on any scripts, tools, or external software?
- - If yes: Note each script/tool path and what functionality it provides
- The autoagent should analyze these to recommend improvements
Step 4: Ask Cron Schedule
Ask the user:
Run optimization every 5 minutes (default), or different interval?
Step 5: Create Sandbox
After all questions answered, create the sandbox folder at the user-specified path:
CODEBLOCK2
Step 6: Set Up Cron
Use OpenClaw cron syntax to schedule the iteration agent:
- - Default: every 5 minutes (
*/5 * * * *) - Command: invoke the iteration prompt with the sandbox path
Step 7: Confirm Start
Return confirmation message showing the resolved path:
"Optimization started at /clawd/autoagent-news/. I'll check back every 5 minutes. Monitor progress in scores.md."
Iteration Phase (Runs Every Cron Interval)
Each time the cron triggers, do the following:
Step 1: Analyze Current State
Read from the sandbox:
- -
current-guidance.md - The guidance being optimized - INLINECODE13 - History of scores and changes
- INLINECODE14 - How to measure success
- INLINECODE15 - Test inputs (MUST read this to understand what the guidance is being tested against)
Review score history (last 10 runs or all available runs if fewer than 10 exist), identify patterns, note current score. When fewer than 10 runs exist, treat all available scores as the set for plateau detection.
Important: Load the test cases from fixtures/test-cases.json to understand what specific outputs/ behaviors are expected. The edit should address gaps revealed by test case failures or missing criteria.
Step 1b: Analyze External Scripts/Tools (If Applicable)
If the guidance references any scripts, tools, or external software:
- 1. Locate each script/tool - Find the actual script files or binary locations
- Analyze the functionality - Read the code or documentation to understand what it does
- Identify improvement opportunities:
-
For open-source scripts: Can the script be modified to improve functionality?
-
For closed-source/compiled tools: Can wrapper behavior be improved? Can you recommend API/interface changes?
- 4. Note findings in the iteration - If script improvements could help test scores, document them
Example outputs:
- - "Script X does Y but could do Z - recommend modification to add feature W"
- "Tool A is closed-source, recommend changing prompt to work around limitation B"
- "Script C has bug in function D - fix would improve test outcomes"
Step 2: Propose Edit
Generate ONE specific edit to the guidance that might improve the score.
Analyze Score History First:
- - Read scores.md to find the last 10 runs
- Identify patterns: Which scoring criteria are consistently low?
- Look for repeated failures - if the same criterion failed multiple times, that's your target
- Check what changes were tried before (avoid repeating failed approaches)
Edit Selection Strategy (Priority Order):
- 1. If scores exist: Target the lowest-scoring criteria from scoring.md
- If all scores high (90+): Add missing detail to any criteria marked as partial
- If only 1-2 runs: Assume baseline covered basics, add missing methodology
- Prioritize edits that affect multiple scoring criteria at once
The edit should:
- - Be specific and actionable (not vague like "improve clarity")
- Address a weakness identified in scoring (target the lowest-scoring criteria)
- Not be identical to recently tried changes (check scores.md for recent descriptions)
- Include the exact text to add/remove/replace
Format:
## Proposed Edit
**Rationale:** Why this change might help
**Change:**
[Show exact diff or new text]
CODEBLOCK4
Step 3: Apply Edit
Write the edited guidance to INLINECODE17
Step 4: Run Test
Use a subagent to run the task with the new guidance:
- - Give the subagent INLINECODE18
- Provide test inputs from INLINECODE19
- Capture the output
- Subagent invocation: Use
sessions_spawn with task containing the full contents of current-guidance.md, include the test cases JSON inline in the task prompt, set timeoutSeconds to 120, and request the subagent to return the raw output (not just pass/fail)
Step 5: Score Result
Evaluate the output against scoring.md criteria.
Generate a score 0-100.
Step 6: Log Decision
Append to scores.md:
CODEBLOCK5
Where N is the run number (increment from last).
Step 7: Update Guidance
- - If score improved: Keep the edit (
current-guidance.md is already updated) - If score declined: Revert
current-guidance.md to previous version
Step 8: Check Plateau
If last 10 scores are within 5 points of each other:
- - Log "Plateau detected - pausing"
- Notify user
- Stop the cron (or pause and await user override)
Files Created in Sandbox
| File | Description |
|---|
| INLINECODE27 | Original copy (read-only reference) |
| INLINECODE28 |
Working version (edited each iteration) |
|
fixtures/test-cases.json | Input → expected output pairs |
|
scoring.md | Scoring methodology |
|
scores.md | Score history log |
|
scripts/ | (optional) Copies of referenced scripts/tools for analysis |
Usage
- 1. Invoke: INLINECODE33
- Answer setup questions
- Monitor
scores.md for progress - Copy improvements to original when satisfied
- Stop cron when done
Stopping
- - User can stop cron anytime
- Auto-stops if score plateaus for 10 runs
- Check
scores.md for progress
Key Principles
- - Non-destructive: Original guidance stays in INLINECODE36
- Learn from history: Don't repeat failed approaches
- Be specific: Vague changes won't score well
- Human in the loop: User defines success criteria, can override plateau detection
Autoagent 技能
通过自动化测试和迭代改进来优化任何智能体指导。
快速开始
/autoagent
功能说明
- 1. 设置阶段 - 询问指导文件位置及其功能目标
- 创建沙盒 - 将指导文件复制到包含测试夹具的测试文件夹
- 运行优化循环 - 通过cron每5分钟执行一次:
- 分析当前指导
- 提出改进方案
- 使用子智能体进行测试
- 对结果进行评分
- 保留或丢弃更改
- 4. 记录所有内容 - 查看scores.md获取历史记录
设置阶段(每次调用重新开始)
每次调用/autoagent都会通过交互式设置问题重新开始。
步骤1:询问沙盒位置
询问用户:
我应该在哪里创建沙盒文件夹?默认:../../autoagent-sandbox/(解析为/clawd/autoagent-sandbox/)
您可以回复:
- - 空/默认:按回车使用../../autoagent-sandbox/
- 仅名称:news创建../../autoagent-news/ → /clawd/autoagent-news/
- 相对路径:agentDev/optimize创建../../agentDev/optimize/ → /clawd/agentDev/optimize/
- 绝对路径:/some/other/path/optimize/ → 精确路径
等待用户响应(或空值使用默认)。
步骤2:讨论成功标准
询问用户:
让我们定义如何衡量成功。对于这个任务,好的结果是什么样的?
根据用户的回复逐一跟进:
- - 期望的具体输出是什么?
- 应该采用什么格式?
- 最低可接受质量是什么?
- 需要考虑哪些边界情况?
一旦获得足够信息,提出评分草案scoring.md:
markdown
建议评分标准
评分组成部分:
- - [组成部分1]:[X]分 - [描述]
- [组成部分2]:[Y]分 - [描述]
- ...
总分: 100分
[任何附加说明]
等待用户批准或修改。
步骤3:询问外部脚本/工具
询问用户:
指导是否依赖于任何脚本、工具或外部软件?
- - 如果是:记录每个脚本/工具路径及其提供的功能
- autoagent应分析这些内容以推荐改进
步骤4:询问Cron计划
询问用户:
每5分钟运行一次优化(默认),还是不同的间隔?
步骤5:创建沙盒
所有问题回答完毕后,在用户指定的路径创建沙盒文件夹:
sandbox/
├── guidance-under-test.md # 原始指导副本
├── current-guidance.md # 初始与guidance-under-test相同
├── fixtures/
│ └── test-cases.json # {cases: [{input: ..., expected: ...}]}
├── scoring.md # 评分标准文档(用户批准)
├── scores.md # 评分历史表
└── scripts/ # (可选)引用的脚本/工具副本
步骤6:设置Cron
使用OpenClaw cron语法安排迭代智能体:
- - 默认:每5分钟(/5 *)
- 命令:使用沙盒路径调用迭代提示
步骤7:确认启动
返回确认消息,显示解析后的路径:
优化已在/clawd/autoagent-news/启动。我将每5分钟检查一次。在scores.md中监控进度。
迭代阶段(每个Cron间隔执行)
每次cron触发时,执行以下操作:
步骤1:分析当前状态
从沙盒中读取:
- - current-guidance.md - 正在优化的指导
- scores.md - 评分和更改历史
- scoring.md - 如何衡量成功
- fixtures/test-cases.json - 测试输入(必须读取以了解指导正在针对什么进行测试)
审查评分历史(最近10次运行,如果少于10次则全部可用),识别模式,记录当前评分。当运行次数少于10次时,将所有可用评分视为平台期检测的集合。
重要: 从fixtures/test-cases.json加载测试用例,以了解期望的具体输出/行为。编辑应解决测试用例失败或缺失标准所揭示的差距。
步骤1b:分析外部脚本/工具(如适用)
如果指导引用了任何脚本、工具或外部软件:
- 1. 定位每个脚本/工具 - 找到实际的脚本文件或二进制位置
- 分析功能 - 阅读代码或文档以了解其功能
- 识别改进机会:
-
对于开源脚本: 是否可以修改脚本以改进功能?
-
对于闭源/编译工具: 是否可以改进包装器行为?是否可以推荐API/接口更改?
- 4. 在迭代中记录发现 - 如果脚本改进有助于测试评分,请记录
示例输出:
- - 脚本X执行Y但可以执行Z - 建议修改以添加功能W
- 工具A是闭源的,建议更改提示以绕过限制B
- 脚本C在函数D中存在错误 - 修复将改善测试结果
步骤2:提出编辑
生成一个针对指导的具体编辑,可能提高评分。
首先分析评分历史:
- - 读取scores.md找到最近10次运行
- 识别模式:哪些评分标准持续偏低?
- 查找重复失败 - 如果同一标准多次失败,那就是您的目标
- 检查之前尝试过的更改(避免重复失败的方法)
编辑选择策略(优先级顺序):
- 1. 如果存在评分:针对scoring.md中评分最低的标准
- 如果所有评分都很高(90+):向任何标记为部分的标准添加缺失的细节
- 如果只有1-2次运行:假设基线已覆盖基础,添加缺失的方法论
- 优先考虑同时影响多个评分标准的编辑
编辑应:
- - 具体且可操作(不是模糊的提高清晰度)
- 针对评分中识别的弱点(针对评分最低的标准)
- 不与最近尝试的更改相同(检查scores.md中的最近描述)
- 包含要添加/删除/替换的确切文本
格式:
markdown
建议编辑
理由: 为什么此更改可能有帮助
更改:
[显示确切的差异或新文本]
步骤3:应用编辑
将编辑后的指导写入current-guidance.md
步骤4:运行测试
使用子智能体运行带有新指导的任务:
- - 给子智能体current-guidance.md
- 提供来自fixtures/test-cases.json的测试输入
- 捕获输出
- 子智能体调用: 使用sessions_spawn,task包含current-guidance.md的完整内容,在任务提示中内联包含测试用例JSON,设置timeoutSeconds为120,要求子智能体返回原始输出(不仅仅是通过/失败)
步骤5:评分结果
根据scoring.md标准评估输出。
生成0-100的评分。
步骤6:记录决策
追加到scores.md:
markdown
| N | 更改描述 | 评分 | 保留/丢弃 |
其中N是运行编号(从上一次递增)。
步骤7:更新指导
- - 如果评分提高:保留编辑(current-guidance.md已更新)
- 如果评分下降:将current-guidance.md恢复为上一版本
步骤8:检查平台期
如果最近10个评分彼此相差在5分以内:
- - 记录检测到平台期 - 暂停
- 通知用户
- 停止cron(或暂停并等待用户覆盖)
沙盒中创建的文件
| 文件 | 描述 |
|---|
| guidance-under-test.md | 原始副本(只读参考) |
| current-guidance.md |
工作版本(每次迭代编辑) |
| fixtures/test-cases.json | 输入→期望输出对 |
| scoring.md | 评分方法 |
| scores.md | 评分历史日志 |
| scripts/ | (可选)引用的脚本/工具副本用于分析 |
使用方法
- 1. 调用:/autoagent
- 回答设置问题
- 监控scores.md了解进度
- 满意后将改进复制到原始文件
- 完成后停止cron
停止
- - 用户可以随时停止cron
- 如果评分在10次运行中达到平台期则自动停止
- 查看scores.md了解进度
关键原则
- - 非破坏性:原始指导保留在guidance-under-test.md中
- 从历史中学习:不重复失败的方法
- 具体明确:模糊的更改不会获得高分
- 人在循环中: