🛡️ OpenClaw Guardian
A battle-hardened watchdog that keeps your OpenClaw gateway running — and tells you when it can't.
What It Does
OpenClaw Guardian runs as a background service and continuously monitors the OpenClaw gateway using two independent health signals. When the gateway goes down, it works through an escalating repair sequence before entering a cooldown and waiting for manual help. Every significant event is logged and sent to your configured alert channel(s).
Health Check Strategy (graduated)
- 1. CLI check —
openclaw gateway status (the authoritative signal) - HTTP fallback —
curl http://localhost:${OPENCLAW_PORT}/health (5s timeout) - Both must fail before the guardian considers the gateway truly down
Repair Strategy (escalating)
| Level | Action | Trigger |
|---|
| 1 — Restart | INLINECODE2 | First failure |
| 2 — Doctor Fix |
openclaw doctor --fix →
openclaw gateway start | After Level 1 fails |
|
3 — Git Rollback | Stash → reset to last stable commit → pop stash | After
GUARDIAN_MAX_REPAIR failures, only if
GUARDIAN_ENABLE_ROLLBACK=true |
|
Cooldown | Sleep
GUARDIAN_COOLDOWN seconds | After all levels exhausted |
Note: Level 3 rollback is off by default and requires explicit opt-in via GUARDIAN_ENABLE_ROLLBACK=true. Even then, it always stashes uncommitted work before resetting — your changes are never silently discarded.
Alerting
Guardian supports both Telegram and Discord simultaneously. If neither is configured, it runs in log-only mode.
Alert events:
- - Guardian started / stopped
- Gateway down detected
- Each repair attempt (with level)
- Repair success / failure
- Rollback triggered
- All repairs exhausted (cooldown entered)
Daily Snapshots
Once per calendar day, guardian runs git add -A && git commit in your workspace. It respects .gitignore, so secrets you've excluded stay excluded. Commit message format: guardian: daily snapshot YYYY-MM-DD.
Quick Start
1. Configure environment variables
Create ~/.openclaw/guardian.env (or export in your shell profile):
CODEBLOCK0
2. Install as a system service
CODEBLOCK1
3. Verify it's running
CODEBLOCK2
4. Run manually (testing / foreground)
CODEBLOCK3
5. Uninstall
CODEBLOCK4
Environment Variable Reference
| Variable | Default | Description |
|---|
| INLINECODE13 | INLINECODE14 | Seconds between health checks |
| INLINECODE15 |
3 | Max Level 1+2 attempts before Level 3 |
|
GUARDIAN_COOLDOWN |
600 | Cooldown sleep (seconds) after all repairs fail |
|
GUARDIAN_ENABLE_ROLLBACK |
false | Enable Level 3 git rollback (
off by default) |
|
GUARDIAN_LOG |
/tmp/openclaw-guardian.log | Log file path (rotates at 1 MB) |
|
GUARDIAN_WORKSPACE |
$HOME/.openclaw/workspace | Path to the OpenClaw workspace git repo |
|
GUARDIAN_TELEGRAM_BOT_TOKEN |
(unset) | Telegram Bot API token |
|
GUARDIAN_TELEGRAM_CHAT_ID |
(unset) | Telegram chat or channel ID |
|
GUARDIAN_DISCORD_WEBHOOK_URL |
(unset) | Discord incoming webhook URL |
|
OPENCLAW_PORT |
(auto-detected) | Gateway HTTP port — auto-parsed from
openclaw gateway status if not set |
File Layout
CODEBLOCK5
Runtime files (created automatically, not committed):
| File | Purpose |
|---|
| INLINECODE30 | Single-instance lockfile containing PID |
| INLINECODE31 |
Date of last successful daily snapshot |
|
/tmp/openclaw-guardian.log | Current log (rotated to
.log.1 at 1 MB) |
How It Improves on myclaw-guardian
| Issue in myclaw-guardian | Fix in openclaw-guardian |
|---|
| INLINECODE34 without stashing — could silently destroy uncommitted work | Always git stash before any reset; git stash pop to restore regardless of outcome |
Process detection via pgrep — fragile, can match wrong process |
Uses
openclaw gateway status (the actual CLI) as primary, with HTTP fallback |
| No lockfile — multiple instances could run simultaneously |
/tmp/openclaw-guardian.lock with PID written; stale lock detection on startup |
| Only Discord alerts | Supports Telegram
and Discord simultaneously; log-only if neither configured |
| Level 3 rollback always enabled — risky default | Level 3 off by default (
GUARDIAN_ENABLE_ROLLBACK=false), explicit opt-in required |
| No graduated health checking | Two independent checks: CLI → HTTP; both must fail before declaring gateway down |
| No cooldown after exhausting repairs | Configurable cooldown (
GUARDIAN_COOLDOWN) before resuming monitoring |
Logging
Logs are timestamped and structured:
CODEBLOCK6
Log rotates automatically when it exceeds 1 MB (one backup: .log.1).
Security Notes
- - No secrets in git — daily snapshots use
git add -A which respects .gitignore. Ensure your .gitignore excludes .env, *.key, etc. - Level 3 rollback is destructive by nature — only enable it if you understand git reset semantics and have tested your
.gitignore coverage. - Alert tokens in env only — never put
GUARDIAN_TELEGRAM_BOT_TOKEN or webhook URLs in files that get committed.
🛡️ OpenClaw Guardian
一个久经沙场的看门狗,确保你的 OpenClaw 网关持续运行——并在无法运行时及时通知你。
功能概述
OpenClaw Guardian 作为后台服务运行,通过两个独立的健康信号持续监控 OpenClaw 网关。当网关宕机时,它会执行逐步升级的修复序列,然后进入冷却状态并等待人工干预。每个重要事件都会被记录并发送到你配置的告警渠道。
健康检查策略(分级)
- 1. CLI 检查 — openclaw gateway status(权威信号)
- HTTP 备用检查 — curl http://localhost:${OPENCLAW_PORT}/health(5秒超时)
- 两个检查均失败后,守护程序才会认为网关真正宕机
修复策略(逐步升级)
| 级别 | 操作 | 触发条件 |
|---|
| 1 — 重启 | openclaw gateway restart | 首次失败 |
| 2 — 自动修复 |
openclaw doctor --fix → openclaw gateway start | 级别1失败后 |
|
3 — Git 回滚 | 暂存 → 重置到最后一个稳定提交 → 恢复暂存 | 达到 GUARDIAN
MAXREPAIR 次失败后,仅当 GUARDIAN
ENABLEROLLBACK=true 时 |
|
冷却 | 休眠 GUARDIAN_COOLDOWN 秒 | 所有级别均耗尽后 |
注意: 级别3回滚默认关闭,需要通过 GUARDIANENABLEROLLBACK=true 显式启用。即使启用,它也会在重置前暂存未提交的工作——你的更改永远不会被静默丢弃。
告警
Guardian 同时支持 Telegram 和 Discord。如果两者均未配置,则仅以日志模式运行。
告警事件:
- - Guardian 启动/停止
- 检测到网关宕机
- 每次修复尝试(含级别信息)
- 修复成功/失败
- 触发回滚
- 所有修复耗尽(进入冷却)
每日快照
每个日历日,guardian 会在工作区运行 git add -A && git commit。它遵循 .gitignore 规则,因此你排除的机密信息不会被包含。提交消息格式:guardian: daily snapshot YYYY-MM-DD。
快速开始
1. 配置环境变量
创建 ~/.openclaw/guardian.env(或在 shell 配置文件中导出):
bash
告警必需——至少设置一个
export GUARDIAN
TELEGRAMBOT_TOKEN=bot123456:ABC...
export GUARDIAN
TELEGRAMCHAT_ID=-1001234567890
或者
export GUARDIAN
DISCORDWEBHOOK_URL=https://discord.com/api/webhooks/...
可选调优参数
export GUARDIAN
CHECKINTERVAL=30
export GUARDIAN
MAXREPAIR=3
export GUARDIAN_COOLDOWN=600
export GUARDIAN
ENABLEROLLBACK=false # 设为 true 以启用 git 回滚
export GUARDIAN_WORKSPACE=$HOME/.openclaw/workspace
export GUARDIAN_LOG=/tmp/openclaw-guardian.log
export OPENCLAW_PORT=3578
2. 安装为系统服务
bash
macOS 或 Linux — 自动检测
./scripts/install-guardian.sh
使用自定义日志路径
GUARDIAN_LOG=/var/log/openclaw-guardian.log ./scripts/install-guardian.sh
3. 验证运行状态
bash
macOS
launchctl list | grep openclaw
Linux
systemctl --user status openclaw-guardian
通用
tail -f /tmp/openclaw-guardian.log
4. 手动运行(测试/前台模式)
bash
先加载配置
source ~/.openclaw/guardian.env
在前台运行 guardian(Ctrl-C 停止)
./scripts/guardian.sh
5. 卸载
bash
./scripts/uninstall-guardian.sh
环境变量参考
| 变量 | 默认值 | 描述 |
|---|
| GUARDIANCHECKINTERVAL | 30 | 健康检查间隔(秒) |
| GUARDIANMAXREPAIR |
3 | 级别3前最大级别1+2尝试次数 |
| GUARDIAN_COOLDOWN | 600 | 所有修复失败后的冷却休眠时间(秒) |
| GUARDIAN
ENABLEROLLBACK | false | 启用级别3 git 回滚(
默认关闭) |
| GUARDIAN_LOG | /tmp/openclaw-guardian.log | 日志文件路径(1 MB 轮转) |
| GUARDIAN_WORKSPACE | $HOME/.openclaw/workspace | OpenClaw 工作区 git 仓库路径 |
| GUARDIAN
TELEGRAMBOT
TOKEN | (未设置)_ | Telegram Bot API 令牌 |
| GUARDIAN
TELEGRAMCHAT
ID | (未设置)_ | Telegram 聊天或频道 ID |
| GUARDIAN
DISCORDWEBHOOK
URL | (未设置)_ | Discord 传入 Webhook URL |
| OPENCLAW
PORT | (自动检测)_ | 网关 HTTP 端口——未设置时从 openclaw gateway status 自动解析 |
文件结构
skills/openclaw-guardian/
├── SKILL.md ← 本文件
└── scripts/
├── guardian.sh ← 主看门狗(持续运行)
├── install-guardian.sh ← 设置 launchd / systemd 服务
└── uninstall-guardian.sh ← 干净卸载
运行时文件(自动创建,不提交):
| 文件 | 用途 |
|---|
| /tmp/openclaw-guardian.lock | 单实例锁文件,包含 PID |
| /tmp/openclaw-guardian-last-snapshot |
上次成功每日快照的日期 |
| /tmp/openclaw-guardian.log | 当前日志(1 MB 时轮转为 .log.1) |
相比 myclaw-guardian 的改进
| myclaw-guardian 的问题 | openclaw-guardian 的修复 |
|---|
| git reset --hard 不暂存——可能静默销毁未提交的工作 | 任何重置前始终 git stash;无论结果如何都 git stash pop 恢复 |
| 通过 pgrep 检测进程——脆弱,可能匹配错误进程 |
使用 openclaw gateway status(实际 CLI)作为主要检测,HTTP 作为备用 |
| 无锁文件——可能同时运行多个实例 | /tmp/openclaw-guardian.lock 写入 PID;启动时检测过期锁 |
| 仅 Discord 告警 | 同时支持 Telegram
和 Discord;两者均未配置时仅日志模式 |
| 级别3回滚始终启用——默认风险高 | 级别3默认关闭(GUARDIAN
ENABLEROLLBACK=false),需显式启用 |
| 无分级健康检查 | 两个独立检查:CLI → HTTP;两者均失败后才判定网关宕机 |
| 修复耗尽后无冷却 | 可配置冷却时间(GUARDIAN_COOLDOWN),之后恢复监控 |
日志记录
日志带有时间戳且结构清晰:
[2026-03-05 11:30:00] [INFO] OpenClaw Guardian 已启动(PID 12345)
[2026-03-05 11:30:30] [INFO] 网关健康
[2026-03-05 11:31:00] [WARN] CLI 状态检查失败——尝试 HTTP 健康端点
[2026-03-05 11:31:05] [WARN] 网关健康检查失败
[2026-03-05 11:31:05] [INFO] 告警:🔴 网关已宕机——开始修复序列
[2026-03-05 11:31:05] [INFO] 修复级别1:重启网关
[2026-03-05 11:31:35] [INFO] 级别1修复成功
当日志超过 1 MB 时自动轮转(保留一个备份:.log.1)。
安全说明
- - 机密信息不入 git——每日快照使用 git add -A,遵循 .gitignore。确保你的 .gitignore 排除了 .env、*.key 等文件。