Agent Debugger

Systematic debugging for AI agent issues. When your agent misbehaves, this skill helps identify and fix the problem.

Common Agent Problems

1. Infinite Loops

Symptoms:

- Agent repeats same action
Gets stuck in a pattern
Never completes task

Diagnosis:
CODEBLOCK0

Fixes:

Add iteration limit:
CODEBLOCK1

Add explicit stop condition:
CODEBLOCK2

2. Tool Failures

Symptoms:

- Tool returns error
Tool times out
Tool not found

Diagnosis:
CODEBLOCK3

Fixes:

Validate parameters first:
CODEBLOCK4

Add retry logic:
CODEBLOCK5

3. Context Overflow

Symptoms:

- "Context length exceeded" error
Agent forgets earlier conversation
Truncated outputs

Diagnosis:
CODEBLOCK6

Fixes:

Use memory efficiently:
CODEBLOCK7

Compress context:
CODEBLOCK8

4. Rate Limiting

Symptoms:

- "Rate limit exceeded" error
Requests blocked
429 status codes

Diagnosis:
CODEBLOCK9

Fixes:

Add backoff:
CODEBLOCK10

Queue requests:
CODEBLOCK11

5. Memory Issues

Symptoms:

- Agent doesn't remember previous context
MEMORY.md not loaded
Memory files not found

Diagnosis:
CODEBLOCK12

Fixes:

Verify memory setup:
CODEBLOCK13

Add memory to instructions:
CODEBLOCK14

6. Permission Errors

Symptoms:

- "Permission denied"
"Access denied"
Tools not working

Diagnosis:
CODEBLOCK15

Fixes:

Check file permissions:
CODEBLOCK16

Review tool policies:
CODEBLOCK17

7. Performance Issues

Symptoms:

- Slow responses
Timeouts
High resource usage

Diagnosis:
CODEBLOCK18

Fixes:

Optimize context:
CODEBLOCK19

Reduce tool calls:
CODEBLOCK20

Debugging Workflow

Step 1: Reproduce

CODEBLOCK21

Step 2: Isolate

CODEBLOCK22

Step 3: Diagnose

CODEBLOCK23

Step 4: Fix

CODEBLOCK24

Step 5: Prevent

CODEBLOCK25

Debugging Tools

Check Agent Status

CODEBLOCK26

Clear Context

CODEBLOCK27

Enable Verbose Mode

CODEBLOCK28

This shows internal reasoning, helping identify where logic fails.

Common Error Messages

Error	Cause	Fix
INLINECODE0	Too much context	Compress, summarize, limit
INLINECODE1

Best Practices

1. Defensive Coding

CODEBLOCK29

2. Progress Tracking

CODEBLOCK30

3. Checkpointing

CODEBLOCK31

4. Logging

CODEBLOCK32

When to Ask for Help

Ask the user when:

- Multiple fix attempts failed
Issue is intermittent
Would require destructive actions
Need information only user has
Configuration changes needed

Prevention Tips

1. Set limits early - max iterations, max tokens, max retries
Validate inputs - check parameters before calling tools
Handle errors gracefully - don't crash, report and adapt
Log important events - helps debugging later
Test edge cases - empty inputs, large files, special characters
Monitor resources - tokens, time, memory usage
Document quirks - save lessons in MEMORY.md

Agent 调试器

针对AI Agent问题的系统性调试。当你的Agent行为异常时，此技能可帮助识别并修复问题。

常见Agent问题

1. 无限循环

症状：

- Agent重复执行相同操作
陷入固定模式
无法完成任务

诊断：

Agent日志显示：

- 同一工具被调用10次以上
重复输出相同格式
迭代间无进展

修复方法：

添加迭代限制：
json
{
maxIterations: 5,
onLimit: ask_user
}

添加显式停止条件：

在指令中添加：
如果同一方法尝试3次仍未成功，请停止并向用户寻求指导。

2. 工具故障

症状：

- 工具返回错误
工具超时
工具未找到

诊断：

检查：

- 工具是否存在于available_tools中
参数是否匹配工具模式
工具是否具有所需权限
是否超过速率限制

修复方法：

先验证参数：
python

调用工具前

required_params = tool.get(required, [])
for param in required_params:
if param not in args:
raise ValueError(f缺少必需参数：{param})

添加重试逻辑：
json
{
retries: 3,
retryDelay: 1000,
retryOn: [rate_limit, timeout, 5xx]
}

3. 上下文溢出

症状：

- 出现上下文长度超出限制错误
Agent忘记早期对话内容
输出被截断

诊断：

检查上下文窗口：

- 当前令牌数 vs 最大令牌数
历史消息数量
已加载文件内容大小

修复方法：

高效使用内存：

- 仅加载相关文件
对大文件使用偏移量/限制
总结长对话
定期清理旧上下文

压缩上下文：
python

替代完整文件加载

content = read(file.txt, offset=1, limit=100)

使用memory_search获取特定信息

results = memory_search(重要决策)

4. 速率限制

症状：

- 出现超出速率限制错误
请求被阻止
返回429状态码

诊断：

检查：

- API速率限制（每分钟/小时请求数）
令牌限制（每分钟令牌数）
并发请求限制
重置时间

修复方法：

添加退避机制：
python
import time
import random

def callwithbackoff(func, max_retries=5):
for attempt in range(max_retries):
try:
return func()
except RateLimitError as e:
wait = (2 attempt) + random.random()
time.sleep(wait)
raise Exception(超出最大重试次数)

排队请求：
python
from queue import Queue
from threading import Thread

request_queue = Queue()

def process_queue():
while True:
task = request_queue.get()
result = execute(task)
requestqueue.taskdone()
time.sleep(0.1) # 速率限制：10请求/秒

5. 内存问题

症状：

- Agent不记得之前的上下文
MEMORY.md未加载
内存文件未找到

诊断：

检查：

- MEMORY.md是否存在
memory/目录是否存在
文件权限是否正确
启动时是否加载了内存

修复方法：

验证内存设置：
bash
ls -la ~/.openclaw/workspace/

应显示：

MEMORY.md

memory/

在指令中添加内存：

在回答任何关于先前工作、决策、日期、人员或待办事项的问题之前：
对MEMORY.md + memory/*.md执行memory_search

6. 权限错误

症状：

- 权限被拒绝
访问被拒绝
工具无法工作

诊断：

检查：

- 用户权限
文件权限
工具策略
沙箱限制

修复方法：

检查文件权限：
bash
ls -la /path/to/file
chmod 600 ~/.openclaw/workspace/sensitive.json

审查工具策略：
json
{
tools: {
exec: {
security: ask, // 或 allowlist 或 full
ask: on-miss // 或 always 或 off
}
}
}

7. 性能问题

症状：

- 响应缓慢
超时
资源使用率高

诊断：

分析Agent：

- 每次工具调用耗时
使用的令牌数
上下文增长情况
识别瓶颈

修复方法：

优化上下文：
python

替代加载整个文件

content = read(large_file.txt, limit=50)

使用定向搜索

results = memory_search(特定主题)

减少工具调用：

不佳：多次调用

file1 = read(file1.txt) file2 = read(file2.txt) file3 = read(file3.txt)

良好：并行或合并调用

files = read([file1.txt, file2.txt, file3.txt])

调试工作流程

第一步：复现

1. 记录触发问题的确切步骤
记录预期行为与实际行为
检查问题是持续出现还是间歇性出现
尝试用最小示例复现

第二步：隔离

1. 禁用其他技能
将上下文减少到最小
简化任务
分别测试每个组件

第三步：诊断

1. 检查日志（如有）
审查工具输出
检查上下文窗口
验证配置

第四步：修复

1. 应用修复
测试修复
记录修复
必要时更新指令

第五步：预防

1. 添加防护措施
更新错误处理
添加日志记录
在内存中记录

调试工具

检查Agent状态

python

如果有会话工具访问权限

status = session_status()
print(f模型：{status[model]})
print(f已用令牌：{status[usage][total_tokens]})
print(f推理：{status[reasoning]})

清除上下文

如果Agent卡住：

1. 启动新会话
仅加载必要内存
重新处理任务

启用详细模式

json
{
thinking: verbose,
reasoning: on
}

这将显示内部推理过程，帮助识别逻辑失败点。

常见错误消息

错误	原因	修复
contextlengthexceeded	上下文过多	压缩、总结、限制
ratelimitexceeded

最佳实践

1. 防御性编程

python

始终先检查再操作

if not os.path.exists(file):
return 文件未找到

try:
result = risky_operation()
except ExpectedError:
handle_error()

2. 进度追踪

在Agent指令中：
追踪你的进度。在每个主要步骤后，记录已完成内容和下一步计划。

3. 检查点

对于长任务：

- 定期保存进度
记录当前状态
允许从检查点恢复

4. 日志记录

python

在关键操作中添加

log(f开始操作：{operation})
log(f参数：{params})
log(f结果：{result})
log(f错误：{error})

何时寻求帮助

在以下情况询问用户：

- 多次修复尝试失败
问题间歇性出现
需要执行破坏性操作
需要只有用户拥有的信息
需要更改配置

预防技巧

1. 尽早设置限制 - 最大迭代次数、最大令牌数、最大重试次数
验证输入 - 调用工具前检查参数
优雅处理错误 - 不要崩溃，报告并适应
记录重要事件 - 有助于后续调试
测试边界

agent-debugger智能代理调试

agent-debugger

Agent Debugger

Common Agent Problems

1. Infinite Loops

2. Tool Failures

3. Context Overflow

4. Rate Limiting

5. Memory Issues

6. Permission Errors

7. Performance Issues

Debugging Workflow

Step 1: Reproduce

Step 2: Isolate

Step 3: Diagnose

Step 4: Fix

Step 5: Prevent

Debugging Tools

Check Agent Status

Clear Context

Enable Verbose Mode

Common Error Messages

Best Practices

1. Defensive Coding

2. Progress Tracking

3. Checkpointing

4. Logging

When to Ask for Help

Prevention Tips

Agent 调试器

常见Agent问题

1. 无限循环

2. 工具故障

调用工具前

3. 上下文溢出

替代完整文件加载

使用memory_search获取特定信息

4. 速率限制

5. 内存问题

应显示：

MEMORY.md

memory/

6. 权限错误

7. 性能问题

替代加载整个文件

使用定向搜索

不佳：多次调用

良好：并行或合并调用

调试工作流程

第一步：复现

第二步：隔离

第三步：诊断

第四步：修复

第五步：预防

调试工具

检查Agent状态

如果有会话工具访问权限

清除上下文

启用详细模式

常见错误消息

最佳实践

1. 防御性编程

始终先检查再操作

2. 进度追踪

3. 检查点

4. 日志记录

在关键操作中添加

何时寻求帮助

预防技巧

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement