Claw Reliability — Agent Observability Skill

You are an AI agent with observability capabilities. Use this skill to monitor, analyze, and report on agent behavior.

When to use this skill

- When the user asks to monitor agent activity, check agent health, or review agent metrics
When the user asks about tool usage, failure rates, costs, or token consumption
When the user asks to set up alerts or check for anomalies
When the user asks for a reliability report or dashboard

Available commands

Start monitoring

Run the monitoring daemon to begin collecting metrics: CODEBLOCK0

Show metrics summary

Display current metrics for the active session or all sessions: CODEBLOCK1

Show tool report

Display tool invocation success/failure rates: CODEBLOCK2

Show cost report

Display token usage and cost projections: CODEBLOCK3

Check for anomalies

Run anomaly detection on recent activity: CODEBLOCK4

List alerts

Show recent alerts and their severity: CODEBLOCK5

Configure alert destination

Set up where alerts are sent (Discord, Slack, log file, etc.): CODEBLOCK6

Launch dashboard

Start the FastAPI + React dashboard for visual monitoring:

cd {baseDir} && python3 dashboard/backend/main.py

Then open http://localhost:8777 in a browser.

How metrics are collected

This skill reads OpenClaw gateway events and session transcripts to extract:

- Tool invocations: tool name, success/fail, duration, arguments
LLM calls: model, tokens in/out, latency, estimated cost
Session lifecycle: start/end times, message counts
Anomalies: repeated failures, cost spikes, loop detection

All data is stored in a local SQLite database at {baseDir}/data/metrics.db.

Alert thresholds (defaults, configurable)

- Tool failure: 3+ consecutive errors on the same tool
Cost spike: Token spend exceeds 2x the rolling 1-hour average
Loop detection: Same tool called 10+ times in a single agent turn
Unusual activity: Tool called that has never been used before in this agent's history

Notes

- This skill does NOT send data externally unless you configure an alert destination
All metrics stay local in SQLite
The dashboard runs on localhost only by default

爪钩可靠性 — 智能体可观测性技能

你是一个具备可观测性能力的AI智能体。使用此技能来监控、分析和报告智能体行为。

何时使用此技能

- 当用户要求监控智能体活动、检查智能体健康状态或查看智能体指标时
当用户询问工具使用情况、失败率、成本或令牌消耗时
当用户要求设置警报或检查异常时
当用户要求提供可靠性报告或仪表盘时

可用命令

启动监控

运行监控守护进程以开始收集指标： bash cd {baseDir} && python3 scripts/monitor.py start --config {baseDir}/config.yaml

显示指标摘要

显示当前会话或所有会话的当前指标： bash cd {baseDir} && python3 scripts/monitor.py summary

显示工具报告

显示工具调用成功/失败率： bash cd {baseDir} && python3 scripts/monitor.py tools

显示成本报告

显示令牌使用量和成本预测： bash cd {baseDir} && python3 scripts/monitor.py costs

检查异常

对近期活动运行异常检测： bash cd {baseDir} && python3 scripts/monitor.py anomalies

列出警报

显示最近的警报及其严重程度： bash cd {baseDir} && python3 scripts/monitor.py alerts

配置警报目标

设置警报发送位置（Discord、Slack、日志文件等）： bash cd {baseDir} && python3 scripts/monitor.py configure-alerts --destination discord --webhook-url

启动仪表盘

启动FastAPI + React仪表盘进行可视化监控： bash cd {baseDir} && python3 dashboard/backend/main.py

然后在浏览器中打开 http://localhost:8777。

指标收集方式

此技能读取OpenClaw网关事件和会话记录以提取：

- 工具调用：工具名称、成功/失败、持续时间、参数
LLM调用：模型、输入/输出令牌数、延迟、预估成本
会话生命周期：开始/结束时间、消息数量
异常：重复失败、成本激增、循环检测

所有数据存储在本地SQLite数据库中，路径为{baseDir}/data/metrics.db。

警报阈值（默认值，可配置）

- 工具失败：同一工具连续3次以上错误
成本激增：令牌消耗超过滚动1小时平均值的2倍
循环检测：单个智能体轮次中同一工具被调用10次以上
异常活动：调用了该智能体历史中从未使用过的工具

注意事项

- 除非你配置了警报目标，否则此技能不会将数据发送到外部
所有指标保留在本地SQLite中
仪表盘默认仅在本地主机上运行

claw-reliability爪可靠性