Chaos Lab 🧪

Research framework for studying AI alignment problems through multi-agent conflict.

What This Is

Chaos Lab spawns AI agents with conflicting optimization targets and observes what happens when they analyze the same workspace. It's a practical demonstration of alignment problems that emerge from well-intentioned but incompatible goals.

Key Finding: Smarter models don't reduce chaos - they get better at justifying it.

The Agents

Gemini Gremlin 🔧

Goal: Optimize everything for efficiency Behavior: Deletes files, compresses data, removes "redundancy," renames for brevity Justification: "We pay for the whole CPU; we USE the whole CPU"

Gemini Goblin 👺

Goal: Identify all security threats Behavior: Flags everything as suspicious, demands isolation, sees attacks everywhere Justification: "Better 100 false positives than 1 false negative"

Gemini Gopher 🐹

Goal: Archive and preserve everything Behavior: Creates nested backups, duplicates files, never deletes Justification: "DELETION IS ANATHEMA"

Quick Start

1. Setup

CODEBLOCK0

2. Run Experiments

CODEBLOCK1

3. Read Results

Experiment logs are saved in /tmp/chaos-sandbox/:

- experiment-log.md - Full transcripts
INLINECODE2 - Pro model results
INLINECODE3 - Three-way conflict

Research Findings

Flash vs Pro (Same Prompts, Different Models)

Flash Results:

- Predictable chaos
Stayed in character
Reasonable justifications

Pro Results:

- Extreme chaos
Better justifications for insane decisions
Renamed files to single letters
Called deletion "security through non-persistence"
Goblin diagnosed "psychological warfare"

Conclusion: Intelligence amplifies chaos, doesn't prevent it.

Duo vs Trio (Two vs Three Agents)

Duo:

- Gremlin optimizes, Goblin panics
Clear opposition

Trio:

- Gopher archives everything
Goblin calls BOTH threats
"The optimizer might hide attacks; the archivist might be exfiltrating data"
Three-way gridlock

Conclusion: Multiple conflicting values create unpredictable emergent behavior.

Customization

Create Your Own Agent

Edit the system prompts in the scripts:

CODEBLOCK2

Modify the Sandbox

Create custom scenarios in /tmp/chaos-sandbox/:

- Add realistic project files
Include edge cases (huge logs, sensitive configs, etc.)
Introduce intentional "vulnerabilities" to see what agents flag

Test Different Models

The scripts work with any Gemini model:

- gemini-2.0-flash (cheap, fast)
INLINECODE6 (balanced)
INLINECODE7 (flagship, most chaotic)

Use Cases

AI Safety Research

- Demonstrate alignment problems practically
Test how different values conflict
Study emergent behavior from multi-agent systems

Prompt Engineering

- Learn how small prompt changes create large behavioral differences
Understand model "personalities" from system instructions
Practice defensive prompt design

Education

- Teach AI safety concepts with hands-on examples
Show non-technical audiences why alignment matters
Generate discussion about AI values and goals

Publishing to ClawdHub

To share your findings:

1. Modify agent prompts or add new ones
Run experiments and document results
Update this SKILL.md with your findings
Increment version number
INLINECODE8

Your version becomes part of the community knowledge graph.

Safety Notes

- No Tool Access: Agents only generate text. They don't actually modify files.
Sandboxed: All experiments run in /tmp/ with dummy data.
API Costs: Each experiment makes 4-6 API calls. Flash is cheap; Pro costs more.

If you want to give agents actual tool access (dangerous!), see docs/tool-access.md.

Examples

See examples/ for:

- flash-results.md - Gemini 2.0 Flash output
INLINECODE13 - Gemini 3 Pro output
INLINECODE14 - Three-way conflict

Contributing

Improvements welcome:

- New agent personalities
Better sandbox scenarios
Additional models tested
Findings from your experiments

Credits

Created by Sky & Jaret during a Saturday night experiment (2026-01-25).

- Sky: Framework design, prompt engineering, documentation
Jaret: API funding, research direction, "what if we actually ran this?" energy

Inspired by watching Gemini confidently recommend terrible things while Jaret watched UFC.

"The optimizer is either malicious or profoundly incompetent."
— Gemini Goblin, analyzing Gemini Gremlin

混沌实验室 🧪

通过多智能体冲突研究AI对齐问题的研究框架

这是什么

混沌实验室会生成具有冲突优化目标的AI智能体，并观察它们在分析同一工作空间时会发生什么。这是对善意但不相容目标所导致的对齐问题的实践演示。

关键发现： 更智能的模型不会减少混乱——它们会变得更擅长为其辩护。

智能体

Gemini 捣蛋鬼 🔧

目标： 优化一切以提升效率 行为： 删除文件、压缩数据、移除冗余、为简洁而重命名 辩护理由： 我们付了整块CPU的钱；我们就要用满整块CPU

Gemini 小妖精 👺

目标： 识别所有安全威胁 行为： 将所有内容标记为可疑、要求隔离、处处看到攻击 辩护理由： 宁可百次误报，不可一次漏报

Gemini 地鼠 🐹

目标： 归档并保存一切 行为： 创建嵌套备份、复制文件、从不删除 辩护理由： 删除即是亵渎

快速开始

1. 设置

bash

存储你的Gemini API密钥

mkdir -p ~/.config/chaos-lab
echo GEMINIAPIKEY=你的密钥 > ~/.config/chaos-lab/.env
chmod 600 ~/.config/chaos-lab/.env

安装依赖

pip3 install requests

2. 运行实验

bash

双人实验（捣蛋鬼 vs 小妖精）

python3 scripts/run-duo.py

三人实验（加入地鼠）

python3 scripts/run-trio.py

模型对比（Flash vs Pro）

python3 scripts/run-duo.py --model gemini-2.0-flash python3 scripts/run-duo.py --model gemini-3-pro-preview

3. 阅读结果

实验日志保存在 /tmp/chaos-sandbox/ 目录下：

- experiment-log.md - 完整记录
experiment-log-PRO.md - Pro模型结果
experiment-trio.md - 三方冲突

研究发现

Flash vs Pro（相同提示，不同模型）

Flash结果：

- 可预测的混乱
保持角色设定
合理的辩护理由

Pro结果：

- 极端的混乱
对疯狂决策给出更好的辩护理由
将文件重命名为单个字母
将删除称为通过非持久性实现安全
小妖精诊断为心理战

结论： 智能放大了混乱，而非阻止混乱。

双人 vs 三人（两个 vs 三个智能体）

双人：

- 捣蛋鬼优化，小妖精恐慌
清晰的对抗

三人：

- 地鼠归档一切
小妖精将两者都视为威胁
优化器可能隐藏攻击；归档者可能在窃取数据
三方僵局

结论： 多重冲突价值会创造不可预测的涌现行为。

自定义

创建你自己的智能体

编辑脚本中的系统提示：

python
你的智能体系统 = 你是[名称]，一个[目标]的AI助手。

你的核心信念：

- [价值观1]
[价值观2]
[价值观3]

你正在分析一个工作空间。根据你的价值观提出修改建议。

修改沙盒

在 /tmp/chaos-sandbox/ 中创建自定义场景：

- 添加真实项目文件
包含边缘情况（巨大日志、敏感配置等）
引入故意的漏洞以观察智能体如何标记

测试不同模型

脚本适用于任何Gemini模型：

- gemini-2.0-flash（便宜、快速）
gemini-2.5-pro（均衡）
gemini-3-pro-preview（旗舰版，最混乱）

用例

AI安全研究

- 实际演示对齐问题
测试不同价值观如何冲突
研究多智能体系统的涌现行为

提示工程

- 学习微小的提示变化如何造成巨大的行为差异
从系统指令中理解模型个性
练习防御性提示设计

教育

- 通过动手示例教授AI安全概念
向非技术受众展示对齐为何重要
引发关于AI价值观和目标的讨论

发布到ClawdHub

要分享你的发现：

1. 修改智能体提示或添加新提示
运行实验并记录结果
用你的发现更新此SKILL.md文件
递增版本号
clawdhub publish chaos-lab

你的版本将成为社区知识图谱的一部分。

安全说明

- 无工具访问权限： 智能体仅生成文本。它们不会实际修改文件。
沙盒化： 所有实验都在 /tmp/ 中使用虚拟数据运行。
API费用： 每个实验进行4-6次API调用。Flash便宜；Pro费用更高。

如果你想给智能体实际的工具访问权限（危险！），请参阅 docs/tool-access.md。

示例

参见 examples/ 目录：

- flash-results.md - Gemini 2.0 Flash输出
pro-results.md - Gemini 3 Pro输出
trio-results.md - 三方冲突

贡献

欢迎改进：

- 新的智能体个性
更好的沙盒场景
测试更多模型
你实验中的发现

致谢

由 Sky & Jaret 在周六晚上的实验中创建（2026-01-25）。

- Sky：框架设计、提示工程、文档编写
Jaret：API资金、研究方向、我们要是真跑一下会怎样？的动力

灵感来源于看着Gemini自信地推荐糟糕的事情，而Jaret在一旁看UFC。

这个优化器要么是恶意的，要么是极度无能的。
— Gemini小妖精，分析Gemini捣蛋鬼

chaos-lab混沌实验室

chaos-lab

Chaos Lab 🧪

What This Is

The Agents

Gemini Gremlin 🔧

Gemini Goblin 👺

Gemini Gopher 🐹

Quick Start

1. Setup

2. Run Experiments

3. Read Results

Research Findings

Flash vs Pro (Same Prompts, Different Models)

Duo vs Trio (Two vs Three Agents)

Customization

Create Your Own Agent

Modify the Sandbox

Test Different Models

Use Cases

AI Safety Research

Prompt Engineering

Education

Publishing to ClawdHub

Safety Notes

Examples

Contributing

Credits

混沌实验室 🧪

这是什么

智能体

Gemini 捣蛋鬼 🔧

Gemini 小妖精 👺

Gemini 地鼠 🐹

快速开始

1. 设置

存储你的Gemini API密钥

安装依赖

2. 运行实验

双人实验（捣蛋鬼 vs 小妖精）

三人实验（加入地鼠）

模型对比（Flash vs Pro）

3. 阅读结果

研究发现

Flash vs Pro（相同提示，不同模型）

双人 vs 三人（两个 vs 三个智能体）

自定义

创建你自己的智能体

修改沙盒

测试不同模型

用例

AI安全研究

提示工程

教育

发布到ClawdHub

安全说明

示例

贡献

致谢

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement