Context Budget Optimizer

Framework: The Token Efficiency Matrix
Worth $200/hr consultant time. Yours for $19.

What This Skill Does

Audits your agent's token usage across every context layer, identifies where you're burning budget on bloat, and produces a 3-week cost reduction roadmap with concrete implementation steps.

Problem it solves: Power users hitting $200-500/month in AI costs often have 60-70% waste baked into their context. Most of it is invisible: stale files in system prompts, redundant skill loading, oversized memory files, wrong model choices. The Token Efficiency Matrix makes the waste visible and rankable.

The Token Efficiency Matrix

A 4-quadrant audit tool that scores every context element by cost (token weight) and ROI (value delivered per token). High cost + low ROI = cut first.

The Matrix

CODEBLOCK0

Action by quadrant:

- KEEP: Don't touch. It's working efficiently.
OPTIMIZE: Compress or lazy-load. Value is there, just expensive.
AUDIT: Review quarterly. Low cost so not urgent, but ROI should be questioned.
CUT: Kill immediately. You're paying for nothing.

Phase 1: Context Inventory

Before scoring, map everything that's in your agent's context.

Context Layers to Audit

CODEBLOCK1

Inventory Template

For each item in your context, fill this in:

Item	Layer	Est. Tokens	Sessions/Day	Daily Cost*	Value (1-5)
SOUL.md	A
MEMORY.md

C | | | | |
| [Skill 1].md | B | | | | |
| [Skill 2].md | B | | | | |
| Daily notes | C | | | | |
| [Project file] | D | | | | |

*Daily Cost = (Est. Tokens / 1M) × modelrate × sessionsper_day

Token estimation cheatsheet:

- 1 page of text ≈ 500-700 tokens
1 SKILL.md file ≈ 800-2,000 tokens
1 code file (100 lines) ≈ 1,200-1,800 tokens
1 MEMORY.md (well-maintained) ≈ 500-1,500 tokens
1 MEMORY.md (neglected/bloated) ≈ 3,000-8,000 tokens

Model rates (as of Q1 2026, approximate):

Model	Input Cost per 1M tokens
Claude Haiku 3.5	~$0.80
Claude Sonnet 4

~$3.00 |
| Claude Opus 4 | ~$15.00 |
| GPT-4o mini | ~$0.15 |
| GPT-4o | ~$2.50 |

Phase 2: Scoring (Token Efficiency Matrix)

Score each context item:

Cost Score (1-5):

Score	Token Range	Description
1	< 200 tokens	Tiny — negligible
2

ROI Score (1-5):

Score	Description
1	Rarely used, generic, stale
2

Occasionally useful |
| 3 | Moderately useful most sessions |
| 4 | Consistently referenced, shapes output |
| 5 | Critical — session breaks without it |

Matrix placement:

- Cost 1-2, ROI 4-5 → KEEP
Cost 4-5, ROI 4-5 → OPTIMIZE
Cost 1-2, ROI 1-2 → AUDIT
Cost 4-5, ROI 1-2 → CUT
Cost 3, ROI 3 → AUDIT (marginal — evaluate quarterly)

Phase 3: Reduction Playbook

CUT (implement immediately)

Items to eliminate first:
CODEBLOCK2

Cut target: 30-40% token reduction with zero quality loss.

OPTIMIZE (implement week 1-2)

Tactic 1: Lazy Loading

Instead of loading all skills at startup, load only when triggered.

Before (eager load):
CODEBLOCK3

After (lazy load):
CODEBLOCK4

Lazy load implementation:

# SKILL-INDEX.md (500 tokens instead of full skills)

Available skills — load when needed:
- mcp-server-setup-kit: MCP connection setup
- agentic-loop-designer: Build autonomous loops  
- context-budget-optimizer: Token cost reduction
- [etc]

To use a skill: "Use the [skill-name] skill"

Tactic 2: Memory Tiering

Not all memory is equally important. Tier it.

CODEBLOCK6

Memory tiering implementation:

1. Create FOCUS.md (Tier 1) — just this week's priorities
Archive daily notes older than 14 days to INLINECODE1
Summarize MEMORY.md quarterly (remove resolved items)
Set system prompt to only inject FOCUS.md + recent 7 days of memory

Tactic 3: Compression Templates

Replace verbose content with compressed references.

Before (bloated system prompt section):
CODEBLOCK7

After (compressed):

Owner: David Flynn | Austin TX | TechCorp (B2B SaaS, logistics, mid-market)
Background: 8yr founder, ex-McKinsey | Team: 6
Style: Direct, metric-first, no fluff
[40 tokens — 87% reduction]

Tactic 4: Model Downgrade Opportunities

Most context-heavy sessions don't need the flagship model.

Downgrade decision tree:
CODEBLOCK9

Model savings calculator:

Switch	Token Cost Reduction	When Safe
Opus → Sonnet	80%	Most writing, analysis, ops
Sonnet → Haiku

Tactic 5: Context Window Management

Stop re-injecting the same content in long sessions.

CODEBLOCK10

3-Week Cost Reduction Roadmap

Week 1: Cut & Quick Wins

Target: 30-40% cost reduction

CODEBLOCK11

Week 2: Optimize Structure

Target: Additional 20-30% reduction

CODEBLOCK12

Week 3: Lock In & Monitor

Target: Establish monitoring + reach 50%+ total reduction

CODEBLOCK13

Token Efficiency Scoring Rubric

After completing the 3-week roadmap, score your setup:

Metric	0	1	2
Average session tokens	> 50K	20-50K	< 20K
Skills lazy-loaded

Score 8-10: Token-efficient operator. You're in the top 5% of AI users by cost.
Score 5-7: Good progress. Keep tightening.
Score 0-4: High burn rate. Revisit Week 1 of the roadmap.

Quick Reference: The 10 Highest-ROI Cuts

If you do nothing else, do these 10 things:

1. Archive memory older than 30 days
Switch routine tasks from Opus/Sonnet to Haiku
Lazy-load skills instead of always-on
Compress system prompt (verbose → structured)
Stop re-reading files in the same session
Archive daily notes older than 14 days
Create FOCUS.md and limit startup context to it
Remove code files from context when not actively editing
Summarize MCP tool outputs instead of keeping raw results
Set model routing rules in AGENTS.md

Combined impact: 50-70% cost reduction for most users.

Example Session

User prompt:

"My Claude usage is $400/month and I don't know why. Help me cut it."

Agent response using this skill:

1. Runs Phase 1 Context Inventory (asks user to share what's in their setup)
Estimates tokens per item using the cheatsheet
Populates the Token Efficiency Matrix
Identifies top 3 CUT items (likely: bloated MEMORY.md, eager skill loading, Opus overuse)
Delivers Week 1 roadmap customized to their setup
Projects: "Based on this, you should reach $150-200/month in 3 weeks"

Bundle Note

This skill is part of the AI Setup & Productivity Pack ($79 bundle):

- MCP Server Setup Kit ($19)
Agentic Loop Designer ($29)
AI OS Blueprint ($39)
Context Budget Optimizer ($19) — you are here
Non-Technical Agent Quickstart ($9)

Save $36 with the full bundle. Built by @RemyClaw.

上下文预算优化器

框架：Token效率矩阵
价值每小时200美元的顾问时间。仅需19美元。

该技能的作用

审计你的智能体在每个上下文层中的Token使用情况，识别你在臃肿内容上浪费预算的地方，并生成一份包含具体实施步骤的3周成本削减路线图。

解决的问题： 每月AI成本达到200-500美元的重度用户，其上下文中通常存在60-70%的浪费。其中大部分是看不见的：系统提示中的过时文件、冗余的技能加载、过大的记忆文件、错误的模型选择。Token效率矩阵让这些浪费变得可见且可排序。

Token效率矩阵

一个四象限审计工具，根据成本（Token权重）和ROI（每个Token带来的价值）对每个上下文元素进行评分。高成本+低ROI = 优先削减。

矩阵

高ROI
│
保留 │ 优化
(高ROI, │ (高ROI,
低成本) │ 高成本)
│
低成本 ──────────────┼────────────────── 高成本
│
审计 │ 削减
(低ROI, │ (低ROI,
低成本) │ 高成本)
│
低ROI

各象限行动：

- 保留： 不要动。它运行效率很高。
优化： 压缩或延迟加载。有价值，但成本高。
审计： 每季度审查。成本低所以不紧急，但应质疑其ROI。
削减： 立即删除。你在为无价值的东西付费。

第一阶段：上下文清单

在评分之前，先映射智能体上下文中的所有内容。

待审计的上下文层

层A：系统提示 / SOUL.md / 身份文件
层B：活跃技能（每次会话加载）
层C：记忆文件（MEMORY.md、每日笔记）
层D：启动时注入的项目文件
层E：上下文中的工具输出 / MCP响应
层F：聊天历史（保留在上下文中的对话轮次）
层G：读入上下文中的代码或数据文件

清单模板

对于上下文中的每个项目，填写以下内容：

项目	层	预估Token数	每日会话数	每日成本*	价值（1-5分）
SOUL.md	A
MEMORY.md

C | | | | |
| [技能1].md | B | | | | |
| [技能2].md | B | | | | |
| 每日笔记 | C | | | | |
| [项目文件] | D | | | | |

*每日成本 = (预估Token数 / 1M) × 模型费率 × 每日会话数

Token估算速查表：

- 1页文本 ≈ 500-700 Token
1个SKILL.md文件 ≈ 800-2,000 Token
1个代码文件（100行）≈ 1,200-1,800 Token
1个MEMORY.md（维护良好）≈ 500-1,500 Token
1个MEMORY.md（被忽视/臃肿）≈ 3,000-8,000 Token

模型费率（2026年第一季度，近似值）：

模型	每百万Token输入成本
Claude Haiku 3.5	~$0.80
Claude Sonnet 4

~$3.00 |
| Claude Opus 4 | ~$15.00 |
| GPT-4o mini | ~$0.15 |
| GPT-4o | ~$2.50 |

第二阶段：评分（Token效率矩阵）

对每个上下文项目进行评分：

成本评分（1-5分）：

分数	Token范围	描述
1	< 200 Token	极小——可忽略
2

200-500 Token | 轻量 |
| 3 | 500-1,500 Token | 中等 |
| 4 | 1,500-4,000 Token | 重量 |
| 5 | > 4,000 Token | 非常重 |

ROI评分（1-5分）：

分数	描述
1	很少使用，通用，过时
2

偶尔有用 |
| 3 | 大多数会话中中等有用 |
| 4 | 持续被引用，塑造输出 |
| 5 | 关键——没有它会话中断 |

矩阵定位：

- 成本1-2，ROI 4-5 → 保留
成本4-5，ROI 4-5 → 优化
成本1-2，ROI 1-2 → 审计
成本4-5，ROI 1-2 → 削减
成本3，ROI 3 → 审计（边缘情况——每季度评估）

第三阶段：削减手册

削减（立即实施）

首先消除的项目：

□ 超过90天且无引用的旧记忆条目
□ 全局加载但仅偶尔使用的技能
□ 多个文件中的重复信息
□ 系统提示中的冗长模板
□ 注入文件中的注释代码
□ 包含在上下文中的调试日志
□ 仅需摘要时却包含的完整文件内容

削减目标： Token减少30-40%，质量零损失。

优化（第1-2周实施）

策略1：延迟加载

不要在启动时加载所有技能，仅在触发时加载。

之前（急切加载）：

系统提示包含所有10个技能文件 → 每次会话15,000 Token

之后（延迟加载）：

系统提示仅包含技能索引 → 500 Token
按需加载单个技能 → 需要时1,000 Token
净效果：每次会话减少14,000 Token（技能节省93%）

延迟加载实现：
markdown

SKILL-INDEX.md（500 Token，而非完整技能）

可用技能——需要时加载：

- mcp-server-setup-kit：MCP连接设置
agentic-loop-designer：构建自主循环
context-budget-optimizer：Token成本削减
[等]

使用技能：使用[技能名称]技能

策略2：记忆分层

并非所有记忆都同等重要。对其进行分层。

第1层（热）：始终在上下文中——当前焦点、活跃项目、今日优先事项
目标：< 500 Token
文件：FOCUS.md

第2层（温）：按需加载——历史决策、已完成项目
目标：< 2,000 Token
文件：MEMORY.md（已摘要）

第3层（冷）：从不自动加载——旧每日笔记、已归档项目
存储：平面文件，可按需搜索
文件：memory/archive/

记忆分层实现：

1. 创建FOCUS.md（第1层）——仅本周优先事项
将超过14天的每日笔记归档到memory/archive/
每季度摘要MEMORY.md（移除已解决项目）
设置系统提示仅注入FOCUS.md + 最近7天的记忆

策略3：压缩模板

用压缩引用替换冗长内容。

之前（臃肿的系统提示部分）：

David Flynn是一位创始人，居住在德克萨斯州奥斯汀。他经营一家名为
TechCorp的公司，为物流领域的中端市场公司构建B2B SaaS产品。
他从事这项工作已有8年，之前在麦肯锡工作。他喜欢直接沟通，
不喜欢废话。他最看重指标和ROI。他的团队有6人...
[300 Token]

之后（压缩后）：

所有者：David Flynn | 奥斯汀 TX | TechCorp（B2B SaaS，物流，中端市场）
背景：8年创始人，前麦肯锡 | 团队：6人
风格：直接，指标优先，无废话
[40 Token —— 减少87%]

策略4：模型降级机会

大多数上下文密集型会话不需要旗舰模型。

降级决策树：

此任务是否需要多步推理？
├── 否 → 使用Haiku（成本降低80-90%）
└── 是 → 这是新问题吗？
├── 否（熟悉模式）→ 使用Sonnet
└── 是（真正复杂）→ 使用Opus

模型节省计算器：

切换	Token成本降低	何时安全
Opus → Sonnet	80%	大多数写作、分析、运营
Sonnet → Haiku

75% | 简单读取、状态检查、格式化 |
| Op

context-budget-optimizer上下文预算优化