OpenClaw Doctor
Comprehensive health check for OpenClaw installations. Outputs a structured diagnostic report with severity levels and actionable fixes.
Language
Respond in the same language the user used to invoke this skill. If invoked via slash command with no additional text, infer the preferred language from context: check recent conversation history, workspace file content (e.g., CJK content in AGENTS.md or cron job payloads), and system locale. Fall back to English only if no language signal is found.
Prerequisites
CODEBLOCK0
Paths
Auto-detect all paths at runtime. Do NOT hardcode platform-specific locations.
CODEBLOCK1
If any path doesn't exist, note it and skip that check section.
Diagnostic Sections
Run ALL sections below sequentially. For each finding, assign a severity:
- -
CRITICAL — broken functionality, data loss risk - INLINECODE1 — suboptimal config, potential issues
- INLINECODE2 — informational, optimization opportunity
1. Installation & Version
Use the built-in status command as the primary data source:
CODEBLOCK2
Report: version, gateway running status, LaunchAgent status, channel health.
2. Config Consistency
Read $OPENCLAW_CONFIG and check:
- 1. Default model validity: Is
agents.defaults.model.primary a known model? Cross-check with agents.defaults.models entries. - Fallback models: Are all models in
agents.defaults.model.fallbacks defined in the models list? - Legacy config files: Check if
clawdbot.json or other legacy files exist in $OPENCLAW_HOME/. - Backup file accumulation: Count
*.bak* files in $OPENCLAW_HOME/. More than 2 is WARNING. - Channel config:
- Telegram: Check
requireMention setting per group.
false = WARNING (bot responds to all messages).
- Feishu: Check
groupPolicy.
"open" = WARNING (any group can interact).
3. Session Maintenance Config
Check openclaw.json for session.maintenance settings:
- 1. Maintenance mode: Missing or
"warn" = WARNING (stale sessions accumulate without cleanup). Should be "enforce". - pruneAfter: Missing or > 30d = INFO. Recommended:
"7d" to "14d". - maxEntries: Missing or > 200 = INFO. Default is 500, reasonable personal value is 50-100.
- maxDiskBytes: Missing = INFO. Recommended: set a cap like
"100mb".
4. Compaction Config
Check agents.defaults.compaction in openclaw.json:
- 1. mode: Should be
"safeguard" (default, safe). Note if missing. - reserveTokensFloor: Missing = WARNING. Without this buffer, context can overflow before compaction triggers. Recommended:
20000. - keepRecentTokens: Missing = INFO. Controls how much recent conversation is preserved verbatim during compaction. Recommended:
8000.
5. Model Alignment
Use the built-in sessions list, then cross-reference with config:
CODEBLOCK3
Also read sessions.json programmatically to check:
- 1. Session model drift: List any sessions whose
model field differs from the configured default. Particularly check channel sessions (telegram:, feishu:). - contextTokens vs model contextWindow: Compare each session's
contextTokens against its model's actual contextWindow (from models.json or built-in registry). Mismatch = WARNING (e.g., 272k contextTokens on a 200k model can cause overflow). - Forward-compat patches: Check if dist files have been locally patched by searching for non-standard constants (e.g. model IDs not in the official
XHIGH_MODEL_REFS or custom resolveForwardCompatModel additions) in $OPENCLAW_DIST/*.js. - Thinking config: Read the thinking config file (find via
grep -rl "XHIGH_MODEL_REFS" $OPENCLAW_DIST/) and verify the current default model is included in XHIGH_MODEL_REFS if it should support xhigh thinking. - models.json override: Read
$MODELS_JSON and check if inline model definitions are consistent with openclaw.json.
6. Session Health
Use the built-in cleanup dry-run as primary data source:
CODEBLOCK4
Then supplement with filesystem checks:
- 1. Orphan JSONL files: Files in directory but not referenced in
sessions.json. Calculate total size. - Zombie session entries: Entries in
sessions.json pointing to non-existent JSONL files. - Empty JSONL files: Referenced files that are 0 bytes.
- Deleted file accumulation:
*.deleted.* files that can be cleaned up. Calculate total size. - Cron session accumulation: Count sessions with
:cron: in their key. Separate parent jobs from :run: sub-sessions. Large numbers (>20) indicate cleanup isn't working.
7. Cron Health
Read $CRON_DIR/jobs.json and check:
- 1. Duplicate enabled jobs: Jobs with identical
name + schedule + enabled: true. Flag as WARNING with dedup suggestion. - Disabled job accumulation: Count
enabled: false jobs. More than 10 = INFO (suggest cleanup if user confirms they're not needed). - Tmp file accumulation: Count
jobs.json.*.tmp files in $CRON_DIR. These are abandoned atomic-write artifacts. Any count > 0 with no process holding them open (lsof) = safe to delete. - Cron runs directory: Check
$CRON_DIR/runs/ for accumulated run logs. Count and total size. - Stale enabled jobs: Enabled jobs whose
state.lastRunAtMs is older than expected based on their schedule (e.g., a daily job that hasn't run in 3+ days).
8. Security Audit
Check openclaw.json for:
- 1. Feishu groupPolicy:
"open" means any Feishu group can interact = CRITICAL. - Feishu/Telegram allowFrom:
["*"] means no restriction = WARNING. - Telegram requireMention:
false on groups = WARNING (bot responds to every message). - Gateway auth mode: Read
gateway.auth.mode from config. "token" is good, "none" = CRITICAL. - Exposed secrets in non-gitignored files: Check if
$OPENCLAW_HOME/ contains any files that might be accidentally synced (e.g., check for .git directory in $OPENCLAW_HOME/). - API keys in models.json: Note if API keys are stored in plaintext in
models.json (this is expected but worth noting).
9. Resource Usage
CODEBLOCK5
Flag:
- - Browser cache > 200MB = WARNING
- Logs > 50MB = WARNING
- Any single JSONL > 10MB = INFO
- Total
$OPENCLAW_HOME/ > 1GB = WARNING
10. Gateway & Process Health
Read gateway.port from openclaw.json to determine the correct port (do NOT hardcode).
CODEBLOCK6
Flag:
- - Multiple gateway processes = CRITICAL
- Gateway not listening on configured port = CRITICAL
- Recent errors in gateway.err.log = WARNING (show last 5 errors)
11. System Instruction Health
Measures static system instruction token footprint. Complements Section 5 (runtime session pressure) — together they form the Context Budget picture.
Step A: Data Collection (deterministic script)
Locate and run the bundled collector script from the skill directory:
CODEBLOCK7
The script auto-detects workspace directories and outputs structured JSON. See scripts/sysinstruction-check.sh for details.
Output schema:
CODEBLOCK8
Step B: LLM Analysis
Analyze the JSON output along these dimensions:
- 1. Context budget ratio (complements Section 5's runtime check):
-
pct_of_context < 2% =
INFO healthy
-
2-5% =
INFO acceptable but worth reviewing
-
5-10% =
WARNING consider trimming
-
10-15% =
WARNING (strong) actively optimize
-
> 15% =
CRITICAL only if paired with runtime truncation evidence
- If
context_window_source is
"default_fallback", note that thresholds may be inaccurate
- 2. Tool description bloat:
-
tool_bloat_files non-empty =
WARNING, list files and estimated token cost
- These are auto-injected by OpenClaw (e.g., Feishu bitable tools) — check if all are necessary
- 3. Reclaimable space (read
empty_template_files and bootstrap_still_present):
-
bootstrap_still_present = true =
INFO, can be deleted after initial setup
-
empty_template_files non-empty =
INFO, list files and estimated token savings
- 4. Per-file analysis:
- Single file > 40% of total tokens =
WARNING, consider splitting
- Single file > 2% of context window =
WARNING, review individually
- AGENTS.md is typically the largest (memory/group-chat/heartbeat rules), but > 5000 tokens warrants review
- 5. Benchmarks (dynamic, based on actual context window):
CODEBLOCK9
- 6. Cross-reference integrity:
- Scan the largest workspace file (typically AGENTS.md) for references to other
.md filenames (e.g.,
HEARTBEAT.md,
BOOTSTRAP.md,
MEMORY.md).
- For each referenced file: check if it exists in any
workspace-*/ directory and whether it is an empty template.
- Referenced but missing =
WARNING — the agent has instructions that depend on a file that doesn't exist. Offer to generate a practical version based on the user's actual config (cron jobs, channels, heartbeat rules in AGENTS.md).
- Referenced but empty template =
WARNING — the file exists but has no real content, so the agent's instructions referencing it are ineffective. Offer to populate it with useful content or remove the dead reference.
Output (integrated into main report)
Add a System Instruction section to Findings:
CODEBLOCK10
Output Format
Present results as a structured diagnostic report:
CODEBLOCK11
Interactive Mode
After presenting the report, ask the user (in their language):
"Would you like me to fix these issues? I can address them one by one, or batch-fix all WARNING-level and below. CRITICAL issues will be confirmed individually."
For fixes:
- - Orphan/deleted files: Offer to delete with size summary.
- Model drift / contextTokens mismatch: Offer to align all sessions to default model and correct contextWindow.
- Config issues: Show the specific config change needed, confirm before applying.
- Cron dedup / tmp cleanup: Show what will be removed, confirm before applying.
- Maintenance config: Suggest optimal values, confirm before applying.
- Resource cleanup: Offer to clear browser cache, rotate logs.
- Security issues: Show the config change and explain the tradeoff before applying.
- System instruction bloat: Offer to archive BOOTSTRAP.md (rename to
.bak), clean empty template files, flag tool description injection. - Referenced but missing/empty workspace files: Generate a practical version based on the user's actual config. For HEARTBEAT.md specifically, analyze cron jobs, channels, and heartbeat rules in AGENTS.md to produce a useful checklist (not an empty template).
Secret Redaction (MANDATORY)
When displaying config excerpts in the report, always redact sensitive values:
- - API keys / tokens: show only first 8 chars +
... (e.g., 8263689670:A...) - Passwords / secrets: show INLINECODE100
- Never include full botToken, appSecret, auth token, or API key values in the report output.
Security & Privacy
This skill operates locally and makes no network requests. However, diagnostic output becomes part of the LLM conversation context.
Files read (read-only unless user approves a fix):
- -
$OPENCLAW_HOME/openclaw.json — main config - INLINECODE102 — model definitions
- INLINECODE103 — session index
- INLINECODE104 — system instruction files
- INLINECODE105 — cron job definitions
- INLINECODE106 — recent gateway errors
- INLINECODE107 — installed dist files (for patch detection)
Files modified (only with explicit user confirmation):
- - Session JSONL files (orphan cleanup)
- Config JSON files (optimization)
- Workspace
.md files (BOOTSTRAP.md archival) - Cron tmp files (cleanup)
Environment variables read: OPENCLAW_HOME (optional override).
No secrets are transmitted or logged. The skill may display config excerpts in the diagnostic report shown to the user.
OpenClaw Doctor
对 OpenClaw 安装进行全面健康检查。输出结构化的诊断报告,包含严重性等级和可操作的修复方案。
语言
使用用户调用此技能时所用的语言进行回复。如果通过斜杠命令调用且没有附加文本,则根据上下文推断首选语言:检查最近的对话历史、工作区文件内容(例如 AGENTS.md 中的中日韩文字内容或 cron 任务负载)以及系统区域设置。仅在未检测到语言信号时回退到英文。
前置条件
bash
command -v openclaw >/dev/null || echo CRITICAL: openclaw not found in PATH
command -v jq >/dev/null || echo CRITICAL: jq not found — install with: brew install jq (macOS) or apt install jq (Linux)
路径
在运行时自动检测所有路径。不要硬编码特定平台的位置。
bash
OPENCLAWHOME=${OPENCLAWHOME:-$HOME/.openclaw}
OPENCLAWCONFIG=$OPENCLAWHOME/openclaw.json
OPENCLAW_DIST=
if command -v openclaw &>/dev/null; then
OPENCLAWDIST=$(dirname $(readlink -f $(command -v openclaw)))/../lib/nodemodules/openclaw/dist
[ -d $OPENCLAWDIST ] || OPENCLAWDIST=
fi
SESSIONSDIR=$OPENCLAWHOME/agents/main/sessions
SESSIONSINDEX=$SESSIONSDIR/sessions.json
MODELSJSON=$OPENCLAWHOME/agents/main/agent/models.json
WORKSPACEGLOB=$OPENCLAWHOME/workspace-*
LOGSDIR=$OPENCLAWHOME/logs
BROWSERCACHE=$OPENCLAWHOME/browser
CRONDIR=$OPENCLAWHOME/cron
如果任何路径不存在,请注明并跳过该检查部分。
诊断部分
按顺序运行以下所有部分。对于每个发现,分配一个严重性等级:
- - CRITICAL — 功能损坏,存在数据丢失风险
- WARNING — 配置欠佳,存在潜在问题
- INFO — 信息性,存在优化机会
1. 安装与版本
使用内置状态命令作为主要数据源:
bash
openclaw status --all 2>&1
openclaw --version 2>/dev/null
报告:版本、网关运行状态、LaunchAgent 状态、频道健康状态。
2. 配置一致性
读取 $OPENCLAW_CONFIG 并检查:
- 1. 默认模型有效性:agents.defaults.model.primary 是否为已知模型?与 agents.defaults.models 条目交叉验证。
- 备用模型:agents.defaults.model.fallbacks 中的所有模型是否都在模型列表中定义?
- 旧版配置文件:检查 $OPENCLAWHOME/ 中是否存在 clawdbot.json 或其他旧版文件。
- 备份文件积累:统计 $OPENCLAWHOME/ 中的 .bak 文件数量。超过 2 个为 WARNING。
- 频道配置:
- Telegram:检查每个群组的 requireMention 设置。false = WARNING(机器人响应所有消息)。
- Feishu:检查 groupPolicy。open = WARNING(任何群组都可以交互)。
3. 会话维护配置
检查 openclaw.json 中的 session.maintenance 设置:
- 1. 维护模式:缺失或为 warn = WARNING(过期会话积累而不清理)。应为 enforce。
- pruneAfter:缺失或大于 30 天 = INFO。建议:7d 到 14d。
- maxEntries:缺失或大于 200 = INFO。默认值为 500,合理的个人值为 50-100。
- maxDiskBytes:缺失 = INFO。建议:设置上限如 100mb。
4. 压缩配置
检查 openclaw.json 中的 agents.defaults.compaction:
- 1. mode:应为 safeguard(默认,安全)。如果缺失请注明。
- reserveTokensFloor:缺失 = WARNING。没有此缓冲区,上下文可能在压缩触发前溢出。建议:20000。
- keepRecentTokens:缺失 = INFO。控制压缩期间保留的最近对话的逐字内容量。建议:8000。
5. 模型对齐
使用内置会话列表,然后与配置交叉引用:
bash
openclaw sessions 2>&1
同时以编程方式读取 sessions.json 检查:
- 1. 会话模型漂移:列出任何 model 字段与配置默认值不同的会话。特别检查频道会话(telegram:,feishu:)。
- contextTokens 与模型 contextWindow:将每个会话的 contextTokens 与其模型的实际 contextWindow(来自 models.json 或内置注册表)进行比较。不匹配 = WARNING(例如,在 200k 模型上使用 272k contextTokens 可能导致溢出)。
- 向前兼容补丁:通过在 $OPENCLAWDIST/*.js 中搜索非常量(例如不在官方 XHIGHMODELREFS 中的模型 ID 或自定义的 resolveForwardCompatModel 添加)来检查分发文件是否已被本地修补。
- 思考配置:读取思考配置文件(通过 grep -rl XHIGHMODELREFS $OPENCLAWDIST/ 查找)并验证当前默认模型是否包含在 XHIGHMODELREFS 中(如果它应支持 xhigh 思考)。
- models.json 覆盖:读取 $MODELS_JSON 并检查内联模型定义是否与 openclaw.json 一致。
6. 会话健康
使用内置清理预演作为主要数据源:
bash
openclaw sessions cleanup --dry-run --fix-missing 2>&1
然后补充文件系统检查:
- 1. 孤立 JSONL 文件:目录中存在但未在 sessions.json 中引用的文件。计算总大小。
- 僵尸会话条目:sessions.json 中指向不存在的 JSONL 文件的条目。
- 空 JSONL 文件:被引用但大小为 0 字节的文件。
- 已删除文件积累:可清理的 .deleted. 文件。计算总大小。
- Cron 会话积累:统计键中包含 :cron: 的会话。将父任务与 :run: 子会话分开。数量较大(>20)表示清理未正常工作。
7. Cron 健康
读取 $CRON_DIR/jobs.json 并检查:
- 1. 重复的已启用任务:具有相同 name + schedule + enabled: true 的任务。标记为 WARNING 并建议去重。
- 已禁用任务积累:统计 enabled: false 的任务。超过 10 个 = INFO(如果用户确认不需要,建议清理)。
- 临时文件积累:统计 $CRONDIR 中的 jobs.json.*.tmp 文件。这些是废弃的原子写入产物。任何数量 > 0 且没有进程持有它们(lsof)= 可安全删除。
- Cron 运行目录:检查 $CRONDIR/runs/ 中积累的运行日志。统计数量和总大小。
- 过期的已启用任务:已启用任务其 state.lastRunAtMs 根据其调度比预期更旧(例如,一个每日任务超过 3 天未运行)。
8. 安全审计
检查 openclaw.json:
- 1. Feishu groupPolicy:open 表示任何 Feishu 群组都可以交互 = CRITICAL。
- Feishu/Telegram allowFrom:[*] 表示无限制 = WARNING。
- Telegram requireMention:群组中为 false = WARNING(机器人响应每条消息)。
- 网关认证模式:从配置中读取 gateway.auth.mode。token 为良好,none = CRITICAL。
- 非 gitignore 文件中暴露的密钥:检查 $OPENCLAWHOME/ 是否包含可能被意外同步的文件(例如,检查 $OPENCLAWHOME/ 中是否存在 .git 目录)。
- models.json 中的 API 密钥:注意 API 密钥是否以明文形式存储在 models.json 中(这是预期的但值得注意)。
9. 资源使用
bash