Agent Resilience

Patterns for surviving context loss, capturing corrections, and continuously improving.

WAL Protocol (Write-Ahead Logging)

The Law: Chat history is a buffer, not storage. Files survive; context doesn't.

Trigger — scan every message for:

- ✏️ Corrections — "It's X, not Y" / "Actually..." / "No, I meant..."
📍 Proper nouns — names, places, companies, products
🎨 Preferences — styles, approaches, "I like/don't like"
📋 Decisions — "Let's do X" / "Go with Y"
🔢 Specific values — numbers, dates, IDs, URLs

If any appear:

1. WRITE FIRST → update INLINECODE0
THEN respond

The urge to respond is the enemy. Write before replying.

SESSION-STATE.md

Active working memory for the current task. Create at memory/SESSION-STATE.md:

CODEBLOCK0

Reset when starting a new unrelated task.

Working Buffer (Danger Zone)

When context reaches ~60%, start logging every exchange to memory/working-buffer.md:

CODEBLOCK1

Clear the buffer at the START of the next 60% threshold (not continuously).

Compaction Recovery

Auto-trigger when session starts with a summary tag, or human says "where were we?":

1. Read memory/working-buffer.md — raw danger-zone exchanges
Read memory/SESSION-STATE.md — active task state
Read today's + yesterday's daily notes
Extract key context back into SESSION-STATE.md
Respond: "Recovered from buffer. Last task was X. Continue?"

Never ask "what were we discussing?" — read the buffer first.

Verify Before Reporting

Before saying "done", "complete", "finished":

1. STOP
Actually test from the user's perspective
Verify the outcome, not just that code exists
Only THEN report complete

Text changes ≠ behavior changes. When changing how something works, identify the architectural component and change the actual mechanism.

Relentless Resourcefulness

Try 10 approaches before asking for help or saying "can't":

- Different CLI flags, tool, API endpoint
Check memory: "Have I done this before?"
Spawn a research sub-agent
Grep logs for past successes

"Can't" = exhausted all options. Not "first try failed."

Self-Improvement Guardrails

When updating behavior/config based on a lesson:

Score the change first (skip if < 50 weighted points):

- High frequency (daily use?) → 3×
Reduces failures → 3×
Saves user effort → 2×
Saves future-agent tokens/time → 2×

Ask: "Does this let future-me solve more problems with less cost?" If no, skip it.

Forbidden: complexity for its own sake, changes you can't verify worked, vague justifications.

Quick Start Checklist

For long/complex tasks:

- [ ] Create memory/SESSION-STATE.md with task + context
[ ] Apply WAL: write corrections/decisions before responding
[ ] At ~60% context: start working buffer
[ ] After any compaction: read buffer before asking questions
[ ] Before reporting done: verify actual outcome

Agent Resilience

在上下文丢失中存活的模式，捕获修正，并持续改进。

WAL协议（预写日志）

法则： 聊天历史是缓冲区，而非存储。文件持久存在；上下文则不然。

触发条件——扫描每条消息：

- ✏️ 修正——是X，不是Y/实际上……/不，我的意思是……
📍 专有名词——姓名、地点、公司、产品
🎨 偏好——风格、方法、我喜欢/不喜欢
📋 决策——我们做X/选Y
🔢 具体数值——数字、日期、ID、URL

如果出现任何一项：

1. 先写入 → 更新 memory/SESSION-STATE.md
再回复

回复的冲动是敌人。回复前先写入。

SESSION-STATE.md

当前任务的活跃工作记忆。在 memory/SESSION-STATE.md 创建：

markdown

会话状态

任务： [我们正在做什么]
关键决策： [已做出的决策]
细节： [通过WAL捕获的修正、名称、数值]
下一步： [接下来做什么]

开始新的无关任务时重置。

工作缓冲区（危险区域）

当上下文达到约60%时，开始将每次交流记录到 memory/working-buffer.md：

markdown

工作缓冲区

状态： 活跃——开始于 [时间戳]

[时间] 人类

[他们的消息]

[时间] 智能体

[1-2句摘要 + 关键细节]

在下一个60%阈值开始时清除缓冲区（而非持续清除）。

压缩恢复

当会话以摘要标签开始，或人类说我们说到哪了？时自动触发：

1. 读取 memory/working-buffer.md —— 原始危险区域交流记录
读取 memory/SESSION-STATE.md —— 活跃任务状态
读取今天和昨天的每日笔记
将关键上下文提取回 SESSION-STATE.md
回复：已从缓冲区恢复。上一个任务是X。继续吗？

永远不要问我们刚才在讨论什么？——先读取缓冲区。

报告前验证

在说完成、完毕、结束之前：

1. 停下
从用户角度实际测试
验证结果，而不仅仅是代码存在
然后才报告完成

文本变化 ≠ 行为变化。当改变某事物的工作方式时，识别架构组件并更改实际机制。

不懈的资源fulness

在寻求帮助或说做不到之前尝试10种方法：

- 不同的CLI标志、工具、API端点
检查记忆：我以前做过这个吗？
生成一个研究子智能体
在日志中搜索过去的成功案例

做不到 = 已穷尽所有选项。而非第一次尝试失败。

自我改进护栏

当基于经验教训更新行为/配置时：

首先对变更评分（加权分数低于50则跳过）：

- 高频率（日常使用？）→ 3倍
减少失败 → 3倍
节省用户精力 → 2倍
节省未来智能体的令牌/时间 → 2倍

问：这能让未来的我以更低的成本解决更多问题吗？如果否，跳过。

禁止：为复杂而复杂、无法验证是否生效的变更、模糊的理由。

快速启动清单

对于长/复杂任务：

- [ ] 创建包含任务和上下文的 memory/SESSION-STATE.md
[ ] 应用WAL：在回复前写入修正/决策
[ ] 在上下文约60%时：启动工作缓冲区
[ ] 任何压缩后：在提问前读取缓冲区
[ ] 报告完成前：验证实际结果

agent-resilience智能体韧性