Restart Task Recovery
Use this workflow to maximize successful recovery after OpenClaw restart.
1) Pre-restart checkpoint (required)
Before any gateway.config.patch, gateway.config.apply, gateway.update.run, or gateway.restart:
- 1. List active sessions that may be impacted (
sessions_list). - For each active work session, capture the latest context (
sessions_history, limit 20-50). - Write a compact checkpoint file at:
-
memory/restart-checkpoints/<YYYY-MM-DD>/<HHmmss>.md
- 4. Include per session:
- sessionKey / label / agent
- goal
- last completed step
- next exact step
- blocked dependencies (if any)
- a ready-to-send resume message (1-2 lines)
Keep checkpoint concise and executable.
2) Restart with explicit recovery intent
When calling gateway restart/config change, set note to include recovery intent, e.g.:
- - “配置已更新并重启;将按 checkpoint 恢复中断任务。”
3) Post-restart recovery sweep
After restart:
- 1. Re-list sessions (
sessions_list) and compare against checkpoint. - For each interrupted/idle target session, send resume message via
sessions_send:
- “Continue where you left off. Last completed:
. Next: . If previous tool call failed, retry from .”
- 3. Do not poll in tight loops. Check on-demand only.
- Summarize recovery status to user:
- recovered sessions
- still blocked sessions
- manual follow-up needed
4) Idempotent task design rules
When resuming tasks, enforce:
- 1. Re-run-safe steps (idempotency key / upsert / duplicate-safe writes).
- Small step boundaries with explicit “done markers”.
- External writes batched, not one-by-one loops.
- On uncertainty, verify state first then continue.
5) V2 automation helper
Use script: scripts/build_checkpoint.py to generate checkpoint markdown from structured JSON.
Example:
CODEBLOCK0
Expected stdin JSON shape:
CODEBLOCK1
6) V3 resume-plan automation
Use script: scripts/generate_resume_plan.py to parse the latest checkpoint and produce a structured resume plan.
Example:
CODEBLOCK2
Then send each items[].resumeMessage to items[].sessionKey via sessions_send.
Rules:
- - Send once per session (no loop polling).
- If a session is already active and progressing, skip resend.
- After sends, post one concise recovery summary to user.
7) V4 one-click recovery payload generator
Use script: scripts/recover_from_latest_checkpoint.py.
It auto-selects the latest checkpoint file and emits a ready JSON payload list for sessions_send calls.
Examples:
CODEBLOCK3
Execution guidance:
- - Read INLINECODE17
- Execute each
actions[] item with INLINECODE19 - Post one concise summary to user
8) V5 pre-resume verifier + manual confirmation gate
Use script: scripts/pre_resume_verify.py to score resume actions before sending.
Examples:
CODEBLOCK4
Behavior:
- - Marks each action as INLINECODE21
- INLINECODE22 risk actions are set to
decision=hold and INLINECODE24 - Only send
decision=send automatically - Ask user confirmation before executing held actions
Recommended execution flow:
- 1. Generate actions with V4
- Verify with V5
- Send all INLINECODE26
- Present
decision=hold list to user for explicit confirmation
9) V6 execution-plan generator (auto-send safe items)
Use script: scripts/execute_verified_recovery.py with V5 output.
Example:
CODEBLOCK5
Behavior:
- - Emits
sendActions[] for auto-safe resumes (decision=send) - Emits
holdForManualConfirm[] for risky resumes (decision=hold)
Execution:
- 1. Execute all
sendActions[] with INLINECODE34 - Ask user to confirm INLINECODE35
- Execute confirmed held items
- Post concise summary
10) Message templates
Read and use: INLINECODE36
重启任务恢复
使用此工作流最大化OpenClaw重启后的成功恢复。
1) 重启前检查点(必需)
在任何 gateway.config.patch、gateway.config.apply、gateway.update.run 或 gateway.restart 之前:
- 1. 列出可能受影响的活跃会话(sessionslist)。
- 对每个活跃工作会话,捕获最新上下文(sessionshistory,限制20-50条)。
- 在以下路径写入紧凑的检查点文件:
- memory/restart-checkpoints/
/.md
- 4. 每个会话包含:
- sessionKey / label / agent
- 目标
- 最后完成的步骤
- 下一步确切步骤
- 阻塞依赖(如有)
- 一条可立即发送的恢复消息(1-2行)
保持检查点简洁且可执行。
2) 带明确恢复意图的重启
调用网关重启/配置变更时,设置 note 包含恢复意图,例如:
- - “配置已更新并重启;将按检查点恢复中断任务。”
3) 重启后恢复扫描
重启后:
- 1. 重新列出会话(sessionslist)并与检查点对比。
- 对每个中断/空闲的目标会话,通过 sessionssend 发送恢复消息:
- “从你中断的地方继续。最后完成:。下一步:。如果之前的工具调用失败,从重试。”
- 3. 不要在紧密循环中轮询。仅按需检查。
- 向用户总结恢复状态:
- 已恢复的会话
- 仍阻塞的会话
- 需要手动跟进
4) 幂等任务设计规则
恢复任务时,强制执行:
- 1. 可安全重跑的步骤(幂等键 / upsert / 防重复写入)。
- 带有明确“完成标记”的小步骤边界。
- 外部写入批量处理,非逐个循环。
- 不确定时,先验证状态再继续。
5) V2自动化辅助
使用脚本:scripts/build_checkpoint.py 从结构化JSON生成检查点Markdown。
示例:
bash
cat session-snapshot.json | python3 scripts/build_checkpoint.py memory/restart-checkpoints/$(date +%F)/$(date +%H%M%S).md
预期的stdin JSON格式:
json
{
sessions: [
{
sessionKey: agent:engineer:main,
agentId: engineer,
goal: 完成回归验证,
lastDone: 401/幂等/时区/保留用例通过,
nextStep: 发布最终验收摘要,
blockers: 无
}
]
}
6) V3恢复计划自动化
使用脚本:scripts/generateresumeplan.py 解析最新检查点并生成结构化恢复计划。
示例:
bash
python3 scripts/generateresumeplan.py memory/restart-checkpoints/2026-03-09/162200.md /tmp/resume-plan.json
然后通过 sessions_send 将每个 items[].resumeMessage 发送到 items[].sessionKey。
规则:
- - 每个会话仅发送一次(无循环轮询)。
- 如果会话已活跃且正在推进,跳过重新发送。
- 发送后,向用户发布一条简洁的恢复摘要。
7) V4一键恢复负载生成器
使用脚本:scripts/recoverfromlatest_checkpoint.py。
它自动选择最新的检查点文件,并发出一个可用于 sessions_send 调用的JSON负载列表。
示例:
bash
自动使用最新检查点
python3 scripts/recoverfromlatest_checkpoint.py > /tmp/recover-actions.json
使用特定检查点
python3 scripts/recoverfromlatest_checkpoint.py memory/restart-checkpoints/2026-03-09/162200.md > /tmp/recover-actions.json
执行指导:
- - 读取 /tmp/recover-actions.json
- 使用 sessions_send 执行每个 actions[] 项
- 向用户发布一条简洁摘要
8) V5恢复前验证器 + 手动确认门控
使用脚本:scripts/preresumeverify.py 在发送前对恢复操作进行评分。
示例:
bash
python3 scripts/preresumeverify.py /tmp/recover-actions.json /tmp/recover-verified.json
行为:
- - 将每个操作标记为 risk=normal|high
- high 风险操作设置为 decision=hold 和 requiresManualConfirm=true
- 仅自动发送 decision=send
- 在执行保留操作前请求用户确认
推荐执行流程:
- 1. 使用V4生成操作
- 使用V5验证
- 发送所有 decision=send
- 向用户展示 decision=hold 列表以获取明确确认
9) V6执行计划生成器(自动发送安全项)
使用脚本:scripts/executeverifiedrecovery.py 配合V5输出。
示例:
bash
python3 scripts/executeverifiedrecovery.py /tmp/recover-verified.json > /tmp/recover-exec.json
行为:
- - 发出 sendActions[] 用于自动安全的恢复(decision=send)
- 发出 holdForManualConfirm[] 用于有风险的恢复(decision=hold)
执行:
- 1. 使用 sessions_send 执行所有 sendActions[]
- 请求用户确认 holdForManualConfirm[]
- 执行已确认的保留项
- 发布简洁摘要
10) 消息模板
读取并使用:references/templates.md