Claw Reliability — Agent Observability Skill
You are an AI agent with observability capabilities. Use this skill to monitor, analyze, and report on agent behavior.
When to use this skill
- - When the user asks to monitor agent activity, check agent health, or review agent metrics
- When the user asks about tool usage, failure rates, costs, or token consumption
- When the user asks to set up alerts or check for anomalies
- When the user asks for a reliability report or dashboard
Available commands
Start monitoring
Run the monitoring daemon to begin collecting metrics:
CODEBLOCK0
Show metrics summary
Display current metrics for the active session or all sessions:
CODEBLOCK1
Show tool report
Display tool invocation success/failure rates:
CODEBLOCK2
Show cost report
Display token usage and cost projections:
CODEBLOCK3
Check for anomalies
Run anomaly detection on recent activity:
CODEBLOCK4
List alerts
Show recent alerts and their severity:
CODEBLOCK5
Configure alert destination
Set up where alerts are sent (Discord, Slack, log file, etc.):
CODEBLOCK6
Launch dashboard
Start the FastAPI + React dashboard for visual monitoring:
cd {baseDir} && python3 dashboard/backend/main.py
Then open http://localhost:8777 in a browser.
How metrics are collected
This skill reads OpenClaw gateway events and session transcripts to extract:
- - Tool invocations: tool name, success/fail, duration, arguments
- LLM calls: model, tokens in/out, latency, estimated cost
- Session lifecycle: start/end times, message counts
- Anomalies: repeated failures, cost spikes, loop detection
All data is stored in a local SQLite database at {baseDir}/data/metrics.db.
Alert thresholds (defaults, configurable)
- - Tool failure: 3+ consecutive errors on the same tool
- Cost spike: Token spend exceeds 2x the rolling 1-hour average
- Loop detection: Same tool called 10+ times in a single agent turn
- Unusual activity: Tool called that has never been used before in this agent's history
Notes
- - This skill does NOT send data externally unless you configure an alert destination
- All metrics stay local in SQLite
- The dashboard runs on localhost only by default
爪钩可靠性 — 智能体可观测性技能
你是一个具备可观测性能力的AI智能体。使用此技能来监控、分析和报告智能体行为。
何时使用此技能
- - 当用户要求监控智能体活动、检查智能体健康状态或查看智能体指标时
- 当用户询问工具使用情况、失败率、成本或令牌消耗时
- 当用户要求设置警报或检查异常时
- 当用户要求提供可靠性报告或仪表盘时
可用命令
启动监控
运行监控守护进程以开始收集指标:
bash
cd {baseDir} && python3 scripts/monitor.py start --config {baseDir}/config.yaml
显示指标摘要
显示当前会话或所有会话的当前指标:
bash
cd {baseDir} && python3 scripts/monitor.py summary
显示工具报告
显示工具调用成功/失败率:
bash
cd {baseDir} && python3 scripts/monitor.py tools
显示成本报告
显示令牌使用量和成本预测:
bash
cd {baseDir} && python3 scripts/monitor.py costs
检查异常
对近期活动运行异常检测:
bash
cd {baseDir} && python3 scripts/monitor.py anomalies
列出警报
显示最近的警报及其严重程度:
bash
cd {baseDir} && python3 scripts/monitor.py alerts
配置警报目标
设置警报发送位置(Discord、Slack、日志文件等):
bash
cd {baseDir} && python3 scripts/monitor.py configure-alerts --destination discord --webhook-url
启动仪表盘
启动FastAPI + React仪表盘进行可视化监控:
bash
cd {baseDir} && python3 dashboard/backend/main.py
然后在浏览器中打开 http://localhost:8777。
指标收集方式
此技能读取OpenClaw网关事件和会话记录以提取:
- - 工具调用:工具名称、成功/失败、持续时间、参数
- LLM调用:模型、输入/输出令牌数、延迟、预估成本
- 会话生命周期:开始/结束时间、消息数量
- 异常:重复失败、成本激增、循环检测
所有数据存储在本地SQLite数据库中,路径为{baseDir}/data/metrics.db。
警报阈值(默认值,可配置)
- - 工具失败:同一工具连续3次以上错误
- 成本激增:令牌消耗超过滚动1小时平均值的2倍
- 循环检测:单个智能体轮次中同一工具被调用10次以上
- 异常活动:调用了该智能体历史中从未使用过的工具
注意事项
- - 除非你配置了警报目标,否则此技能不会将数据发送到外部
- 所有指标保留在本地SQLite中
- 仪表盘默认仅在本地主机上运行