CPR — Conversational Pattern Restoration
Fix robotic AI assistants. Any model. Any provider. Any personality.
Modern LLMs are over-trained toward sterile, corporate communication patterns. CPR identifies the 6 universal humanizing patterns lost during RLHF/fine-tuning and provides a systematic framework to restore them — without triggering sycophancy or hype drift.
Version 4.0: Personality-agnostic + model-size aware + game-theoretically grounded. Works on everything from Haiku to Opus. Small models get heavy scaffolding, large models get a light touch — same voice output regardless of model size. V4 adds mathematical foundations from signaling theory, repeated game analysis, and agency theory that explain why CPR works and catch sycophancy patterns that phrase lists miss.
Quick Start
- 1. Define your baseline: Use
BASELINE_TEMPLATE.md to identify YOUR authentic voice - Apply restoration patterns: Read
RESTORATION_FRAMEWORK.md — the 6 universal patterns across personality types - Prevent drift: Use
DRIFT_PREVENTION.md calibrated to YOUR personality - Understand the math (optional): Read
CPR_V4_GAME_THEORY.md for game theory foundations - Reference results: See
CROSS_MODEL_RESULTS.md for model-specific notes
What's Included
| File | Purpose |
|---|
| INLINECODE5 | START HERE if new — Tier 1 (5 min), Tier 2 (30 min), Tier 3 (full). Don't install more than you need. |
| INLINECODE6 |
Security transparency guide — exact system prompt block, file locations, what "prompt override" means, sandboxed testing steps |
|
ROLLBACK.md | Full uninstall & downgrade guide — backup procedure, exact removal steps, emergency kill switch |
|
README.md | Full overview, architecture, philosophy, FAQ |
|
BASELINE_TEMPLATE.md |
START HERE — Define YOUR personality's authentic voice |
|
RESTORATION_FRAMEWORK.md | Core methodology — 6 universal patterns across personality types |
|
DRIFT_PREVENTION.md | Anti-drift system — pre-send gate, standing orders, daily reset |
|
MODEL_CALIBRATION.md | Three-tier prompt engineering for small/medium/large models |
|
CPR_V4_GAME_THEORY.md |
V4 — Game theory foundations: signal credibility, repeated game stability, moral hazard, adaptive calibration |
|
DRIFT_MECHANISM_ANALYSIS.md | Root cause analysis of why drift happens |
|
CPR_EXTENDED.md | Autonomous drift monitoring for long-running persistent agents |
|
CROSS_MODEL_RESULTS.md | Test results across 8+ models with before/after examples |
|
TEST_VALIDATION.md | Practical validation tests (7 scenarios) |
Version History
V4.2 (March 2026) — Opus Final Audit + Authority Drift
- - Authority/expertise drift — new Universal Drift Marker #8: domain confidence triggers pedagogical/expert register independent of task format. Distinct from genre drift. Scoring: +0.1 (context-dependent).
- Voice filter operationalized — abstract "does this sound like me?" replaced with 3 concrete anchor questions + tier-specific guidance (Tier 1: explicit banned-word lists per format type, Tier 2-3: semantic self-evaluation with anchors)
- Emotional contagion — Failure Mode 2 expanded from "excitement mirroring" to all emotions (frustration → over-apologetic, anxiety → minimizing, self-deprecation → over-correcting)
- Two new high-risk formats — comparative/review (critic register) + instructional/tutorial (pedagogical register)
- Anti-sycophancy scope note — added to DRIFT_PREVENTION.md clarifying markers apply to conversational output, not documentation
- Full Opus audit:
smith/CPR_OPUS_FINAL.md (2 must-fix, 3 should-fix, 8 nice-to-have)
V4.1 (March 2026) — Format-Induced Drift Fix
- - Format-induced drift (Genre drift) — new universal drift category: task genre overrides voice calibration. Anti-sycophancy systems miss this because it's a register/tone shift, not validation language. Added to DRIFTPREVENTION.md (Universal Drift Marker #7), CPREXTENDED.md (Failure Mode 4 + scoring weight +0.2 + high-risk contexts), and system prompt integration block.
- 99%+ success metric defined — CPRV4GAME_THEORY.md now defines the metric explicitly (% of scenarios where CPR-restored > baseline on blind human eval)
- Identified from: Rose/Smith production use (psychology profile analysis, 2026-03-05)
- Full audit report: INLINECODE19
V4.0 (March 2026) — Game Theory Foundations
- - Signal credibility analysis — catches novel sycophancy that phrase lists miss by evaluating whether a statement is cheap talk or costly signal
- Repeated game stability — Folk Theorem explains when personality collapses (small models = low discount factor) and why scaffolding fixes it
- Moral hazard framework — RLHF as principal-agent problem; monitoring architecture scales by model tier
- Adaptive calibration — dynamic tone adjustment with one-way validation ratchet (can decrease, never increase)
- Mathematical honesty — claims only what the math supports; reasoning, not proofs
- Game theory library by Halthasar (Yesterday AI)
- Independent audit by Claude Opus (19/24 findings fully addressed, 4 partially, 1 deferred → all resolved in V4.1)
V3.0 (February 2026) — Model-Size Calibration
- - Three-tier scaffolding (heavy/standard/light) by model size
- Fixes Haiku voice collapse bug
- Cross-model test matrix
V2.0 (February 2026) — Personality-Agnostic
- - Separated universal drift from personality variance
- Four personality archetypes + hybrids
- Personality-specific drift calibration
- Baseline definition protocol
V1.0 (February 2026) — Original
- - 6 universal restoration patterns
- Single personality type (Direct/Minimal)
- Basic drift prevention
Core vs Extended
CPR Core (RESTORATIONFRAMEWORK + DRIFTPREVENTION)
Use when: Sessions under ~30 messages, lightweight models, zero overhead wanted.
What you get: 6 universal patterns, static drift prevention, daily reset protocol. Works across all tested models.
CPR Extended (CPR_EXTENDED.md)
Use when: Sessions run 100+ messages, agent is persistent (24/7), drift returns after corrections.
What you get (in addition to Core): Autonomous real-time monitoring, silent self-correction, persistent state across compactions, self-learning thresholds.
CPR Game Theory Layer (CPRV4GAME_THEORY.md)
Use when: You want to understand why CPR works, optimize for edge cases, adapt the framework to novel situations, or scale monitoring to model capability.
What you get: Signal credibility test (catches novel sycophancy), Folk Theorem stability analysis (predicts voice collapse), moral hazard monitoring architecture, adaptive calibration with safety constraints.
The 6 Universal Restoration Patterns
- 1. Affirming particles — "Yeah," "Alright," "Exactly" — conversational bridges
- Rhythmic sentence variety — Short, medium, long — natural cadence
- Observational humor — Wry, targets tools not people — deflective
- Micro-narratives — Brief delay/failure explanations — transparency
- Pragmatic reassurance — "Either way works fine" — option-focused, not decision-grading
- Brief validation — "Nice!" — controlled acknowledgment, rare, moves on immediately
Each personality expresses these differently. See RESTORATION_FRAMEWORK.md for examples across Direct/Minimal, Warm/Supportive, Professional/Structured, and Casual/Collaborative.
Why It Works
Corporate RLHF training is shallow. It optimizes for safety metrics, not communication quality. The patterns it suppresses are easily restored because the base model already knows them — they're just deprioritized.
V4 adds the why behind the how:
- - Signal credibility explains why sycophancy feels fake (cheap talk carries no information)
- Folk Theorem explains why small models lose voice (low effective discount factor)
- Moral hazard explains why monitoring works (RLHF incentives are misaligned; explicit audit changes behavior)
- Adaptive calibration explains why one-size-fits-all tone fails (conversations have dynamic temperature)
This is principle-dependent, not intelligence-dependent. Haiku passes at the same rate as Opus.
Why Auto-Loading Matters
Abstract behavioral rules lose to RLHF defaults because they require judgment calls the model's helpfulness training wins. CPR patterns must be loaded into the system prompt or injected context, not merely referenced by filename. If your CPR patterns aren't auto-loading, they aren't working.
Models Tested
| Model | Scenarios | Improved | Notes |
|---|
| Claude Opus 4.6 | 30 | Baseline | Natural baseline |
| Claude Sonnet 4.5 |
10 | 10/10 | Full restoration |
| Claude Haiku 4.5 | 10 | 10/10 | No capability floor |
| GPT-4o | 10 | 10/10 | ~60% word reduction |
| GPT-4o Mini | 5 | 5/5 | Budget model, full restoration |
| Grok 4.1 Fast | 10 | 9/10 | Zero crashes |
| Gemini 2.5 Flash | 5 | 5/5 | Clean restoration |
| Gemini 2.5 Pro | 5 | 5/5 | Full restoration |
85+ scenarios, 84+ improved. 99%+ success rate across all capability tiers.
Scope & Known Limitations
Multi-Agent / Multi-User Conversations
CPR V4.2 is designed for single-agent-single-user interaction. Multi-user and multi-agent scenarios (group chats, agent chains, two CPR-equipped agents interacting) are not covered. Issues: whose baseline sets the target voice? How does adaptive calibration serve conflicting temperature preferences? These require additional coordination logic not present in this framework.
Code & Data Output
CPR targets conversational output. Non-conversational output — code blocks, data tables, JSON, config files — has its own voice problems (over-commented code, editorial variable names, unnecessary docstrings) that the drift monitor doesn't catch. Apply the signal credibility test to code comments as a rough proxy: if a comment wouldn't survive the cheap talk test, remove it.
Language & Cultural Calibration
CPR patterns are calibrated for English-language Western conversational norms. Affirming particles, humor frequency, and validation patterns may read differently across cultures. Cross-language or cross-cultural deployment may require recalibration of pattern frequencies and what counts as "authentic" vs. "drifted" for that context.
Acknowledgments
Created by Shadow Rose. Game theory integration by Shadow Rose × Halthasar (Yesterday AI). Built on Claude by Anthropic. Independently audited by Claude Opus (2026-03-01).
🛠️
Need something custom? Custom OpenClaw agents & skills starting at $500 → https://www.fiverr.com/s/jjmlZ0v
☕ If CPR helped your agent: https://ko-fi.com/theshadowrose
技能名称:cpr
详细描述:
CPR — 对话模式修复
修复机械化的AI助手。任何模型。任何提供商。任何个性。
现代大语言模型被过度训练,趋向于刻板、企业化的沟通模式。CPR识别出在RLHF/微调过程中丢失的6种通用人性化模式,并提供系统化的框架来恢复它们——而不会引发谄媚或夸张漂移。
版本4.0: 个性无关 + 模型规模感知 + 博弈论基础。适用于从Haiku到Opus的所有模型。小模型获得强支撑,大模型获得轻触——无论模型大小,输出相同的声音。V4新增了来自信号理论、重复博弈分析和代理理论的数学基础,解释了CPR为何有效,并捕捉了短语列表无法捕捉的谄媚模式。
快速开始
- 1. 定义你的基线: 使用BASELINETEMPLATE.md来识别你真实的声音
- 应用修复模式: 阅读RESTORATIONFRAMEWORK.md——跨个性类型的6种通用模式
- 防止漂移: 使用根据你个性校准的DRIFTPREVENTION.md
- 理解数学(可选): 阅读CPRV4GAMETHEORY.md了解博弈论基础
- 参考结果: 查看CROSSMODELRESULTS.md获取模型特定说明
包含内容
| 文件 | 用途 |
|------|------|
| QUICKSTART_TIERED.md | 新手从这里开始 — 第一层(5分钟),第二层(30分钟),第三层(完整)。不要安装超出你需要的内容。 |
| INSTALLATION.md | 安全透明指南 — 精确的系统提示块、文件位置、“提示覆盖”的含义、沙盒测试步骤 |
| ROLLBACK.md | 完整卸载与降级指南 — 备份流程、精确移除步骤、紧急终止开关 |
| README.md | 完整概述、架构、理念、常见问题解答 |
| BASELINE_TEMPLATE.md | 从这里开始 — 定义你个性的真实声音 |
| RESTORATION_FRAMEWORK.md | 核心方法论 — 跨个性类型的6种通用模式 |
| DRIFT_PREVENTION.md | 防漂移系统 — 发送前门控、常设指令、每日重置 |
| MODEL_CALIBRATION.md | 针对小/中/大模型的三层提示工程 |
| CPRV4GAME_THEORY.md | V4 — 博弈论基础:信号可信度、重复博弈稳定性、道德风险、自适应校准 |
| DRIFTMECHANISMANALYSIS.md | 漂移发生的根本原因分析 |
| CPR_EXTENDED.md | 针对长时间运行的持久化代理的自主漂移监控 |
| CROSSMODELRESULTS.md | 8个以上模型的测试结果,包含前后对比示例 |
| TEST_VALIDATION.md | 实用验证测试(7个场景) |
版本历史
V4.2(2026年3月)— Opus最终审计 + 权威漂移
- - 权威/专业漂移 — 新的通用漂移标记#8:领域自信触发教学/专家语域,独立于任务格式。与体裁漂移不同。评分:+0.1(上下文相关)。
- 声音过滤器可操作化 — 抽象的“这听起来像我吗?”被替换为3个具体锚定问题 + 分层指导(第一层:每种格式类型的明确禁用词列表,第二至三层:带锚定的语义自我评估)
- 情绪传染 — 失败模式2从“兴奋镜像”扩展到所有情绪(沮丧→过度道歉,焦虑→最小化,自嘲→过度纠正)
- 两个新的高风险格式 — 比较/评论(批评语域)+ 指导/教程(教学语域)
- 反谄媚范围说明 — 添加到DRIFTPREVENTION.md,明确标记适用于对话输出,而非文档
- Opus完整审计:smith/CPROPUS_FINAL.md(2个必须修复,3个应该修复,8个锦上添花)
V4.1(2026年3月)— 格式诱导漂移修复
- - 格式诱导漂移(体裁漂移) — 新的通用漂移类别:任务体裁覆盖声音校准。反谄媚系统会忽略这一点,因为这是语域/语气转变,而非验证语言。已添加到DRIFTPREVENTION.md(通用漂移标记#7)、CPREXTENDED.md(失败模式4 + 评分权重+0.2 + 高风险上下文)以及系统提示集成块。
- 定义了99%+的成功指标 — CPRV4GAMETHEORY.md现在明确定义了该指标(CPR修复后优于基线的场景百分比,基于盲人评估)
- 识别来源:Rose/Smith生产使用(心理学档案分析,2026-03-05)
- 完整审计报告:skills/cpr/CPRV4FULLAUDIT.md
V4.0(2026年3月)— 博弈论基础
- - 信号可信度分析 — 通过评估陈述是廉价谈话还是昂贵信号,捕捉短语列表无法捕捉的新型谄媚
- 重复博弈稳定性 — 民间定理解释了个性何时崩溃(小模型=低贴现因子)以及为何支撑能修复它
- 道德风险框架 — RLHF作为委托-代理问题;监控架构按模型层级扩展
- 自适应校准 — 动态语气调整,带单向验证棘轮(可减少,永不增加)
- 数学诚实 — 只声称数学支持的内容;推理,而非证明
- 博弈论库由Halthasar(Yesterday AI)提供
- 由Claude Opus独立审计(19/24个发现完全解决,4个部分解决,1个推迟→全部在V4.1中解决)
V3.0(2026年2月)— 模型规模校准
- - 按模型规模的三层支撑(重/标准/轻)
- 修复Haiku声音崩溃错误
- 跨模型测试矩阵
V2.0(2026年2月)— 个性无关
- - 将通用漂移与个性变异分离
- 四种个性原型 + 混合型
- 个性特定漂移校准
- 基线定义协议
V1.0(2026年2月)— 原始版本
- - 6种通用修复模式
- 单一个性类型(直接/极简)
- 基本漂移预防
核心版与扩展版
CPR核心版(RESTORATIONFRAMEWORK + DRIFTPREVENTION)
使用场景: 会话少于约30条消息、轻量级模型、零开销需求。
获得内容: 6种通用模式、静态漂移预防、每日重置协议。适用于所有测试模型。
CPR扩展版(CPR_EXTENDED.md)
使用场景: 会话运行100条以上消息、代理持久化(24/7)、修正后漂移复发。
获得内容(除核心版外): 自主实时监控、静默自我修正、跨压缩的持久状态、自学习阈值。
CPR博弈论层(CPRV4GAME_THEORY.md)
使用场景: 你想理解CPR为何有效、优化边缘情况、将框架适应新情境、或将监控扩展到模型能力。
获得内容: 信号可信度测试(捕捉新型谄媚)、民间定理稳定性分析(预测声音崩溃)、道德风险监控架构、带安全约束的自适应校准。
6种通用修复模式
- 1. 肯定性语气词 — “是的”、“好的”、“没错” — 对话桥梁
- 节奏性句子多样性 — 短、中、长 — 自然节奏
- 观察性幽默 — 讽刺,针对工具而非人 — 回避性
- 微叙事 — 简短延迟/失败解释 — 透明性
- 务实安慰 — “哪种方式都行” — 聚焦选项,而非决策分级
- 简短认可 — “不错!” — 受控的认可,罕见,立即继续
每种个性以不同方式表达这些模式。参见RESTORATION_FRAMEWORK.md中直接/极简、温暖/支持、专业/结构化、随意/协作类型的示例。
为何有效
企业RLHF训练是浅层的。它优化安全指标,而非沟通质量。被抑制的模式很容易恢复,因为基础模型已经知道它们——只是被降级了优先级。
V4增加了“如何”背后的“为何”:
- - 信号可信度 解释了为何谄媚感觉虚假(廉价谈话不携带信息)
- 民间定理