Context Budget Optimizer
Framework: The Token Efficiency Matrix
Worth $200/hr consultant time. Yours for $19.
What This Skill Does
Audits your agent's token usage across every context layer, identifies where you're burning budget on bloat, and produces a 3-week cost reduction roadmap with concrete implementation steps.
Problem it solves: Power users hitting $200-500/month in AI costs often have 60-70% waste baked into their context. Most of it is invisible: stale files in system prompts, redundant skill loading, oversized memory files, wrong model choices. The Token Efficiency Matrix makes the waste visible and rankable.
The Token Efficiency Matrix
A 4-quadrant audit tool that scores every context element by cost (token weight) and ROI (value delivered per token). High cost + low ROI = cut first.
The Matrix
CODEBLOCK0
Action by quadrant:
- - KEEP: Don't touch. It's working efficiently.
- OPTIMIZE: Compress or lazy-load. Value is there, just expensive.
- AUDIT: Review quarterly. Low cost so not urgent, but ROI should be questioned.
- CUT: Kill immediately. You're paying for nothing.
Phase 1: Context Inventory
Before scoring, map everything that's in your agent's context.
Context Layers to Audit
CODEBLOCK1
Inventory Template
For each item in your context, fill this in:
| Item | Layer | Est. Tokens | Sessions/Day | Daily Cost* | Value (1-5) |
|---|
| SOUL.md | A | | | | |
| MEMORY.md |
C |
| |
| |
| [Skill 1].md | B |
| |
| |
| [Skill 2].md | B |
| |
| |
| Daily notes | C |
| |
| |
| [Project file] | D |
| |
| |
*Daily Cost = (Est. Tokens / 1M) × modelrate × sessionsper_day
Token estimation cheatsheet:
- - 1 page of text ≈ 500-700 tokens
- 1 SKILL.md file ≈ 800-2,000 tokens
- 1 code file (100 lines) ≈ 1,200-1,800 tokens
- 1 MEMORY.md (well-maintained) ≈ 500-1,500 tokens
- 1 MEMORY.md (neglected/bloated) ≈ 3,000-8,000 tokens
Model rates (as of Q1 2026, approximate):
| Model | Input Cost per 1M tokens |
|---|
| Claude Haiku 3.5 | ~$0.80 |
| Claude Sonnet 4 |
~$3.00 |
| Claude Opus 4 | ~$15.00 |
| GPT-4o mini | ~$0.15 |
| GPT-4o | ~$2.50 |
Phase 2: Scoring (Token Efficiency Matrix)
Score each context item:
Cost Score (1-5):
| Score | Token Range | Description |
|---|
| 1 | < 200 tokens | Tiny — negligible |
| 2 |
200-500 tokens | Light |
| 3 | 500-1,500 tokens | Medium |
| 4 | 1,500-4,000 tokens | Heavy |
| 5 | > 4,000 tokens | Very heavy |
ROI Score (1-5):
| Score | Description |
|---|
| 1 | Rarely used, generic, stale |
| 2 |
Occasionally useful |
| 3 | Moderately useful most sessions |
| 4 | Consistently referenced, shapes output |
| 5 | Critical — session breaks without it |
Matrix placement:
- - Cost 1-2, ROI 4-5 → KEEP
- Cost 4-5, ROI 4-5 → OPTIMIZE
- Cost 1-2, ROI 1-2 → AUDIT
- Cost 4-5, ROI 1-2 → CUT
- Cost 3, ROI 3 → AUDIT (marginal — evaluate quarterly)
Phase 3: Reduction Playbook
CUT (implement immediately)
Items to eliminate first:
CODEBLOCK2
Cut target: 30-40% token reduction with zero quality loss.
OPTIMIZE (implement week 1-2)
Tactic 1: Lazy Loading
Instead of loading all skills at startup, load only when triggered.
Before (eager load):
CODEBLOCK3
After (lazy load):
CODEBLOCK4
Lazy load implementation:
# SKILL-INDEX.md (500 tokens instead of full skills)
Available skills — load when needed:
- mcp-server-setup-kit: MCP connection setup
- agentic-loop-designer: Build autonomous loops
- context-budget-optimizer: Token cost reduction
- [etc]
To use a skill: "Use the [skill-name] skill"
Tactic 2: Memory Tiering
Not all memory is equally important. Tier it.
CODEBLOCK6
Memory tiering implementation:
- 1. Create
FOCUS.md (Tier 1) — just this week's priorities - Archive daily notes older than 14 days to INLINECODE1
- Summarize MEMORY.md quarterly (remove resolved items)
- Set system prompt to only inject FOCUS.md + recent 7 days of memory
Tactic 3: Compression Templates
Replace verbose content with compressed references.
Before (bloated system prompt section):
CODEBLOCK7
After (compressed):
Owner: David Flynn | Austin TX | TechCorp (B2B SaaS, logistics, mid-market)
Background: 8yr founder, ex-McKinsey | Team: 6
Style: Direct, metric-first, no fluff
[40 tokens — 87% reduction]
Tactic 4: Model Downgrade Opportunities
Most context-heavy sessions don't need the flagship model.
Downgrade decision tree:
CODEBLOCK9
Model savings calculator:
| Switch | Token Cost Reduction | When Safe |
|---|
| Opus → Sonnet | 80% | Most writing, analysis, ops |
| Sonnet → Haiku |
75% | Simple reads, status checks, formatting |
| Opus → Haiku | 95% | Very simple tasks only |
Tactic 5: Context Window Management
Stop re-injecting the same content in long sessions.
CODEBLOCK10
3-Week Cost Reduction Roadmap
Week 1: Cut & Quick Wins
Target: 30-40% cost reduction
CODEBLOCK11
Week 2: Optimize Structure
Target: Additional 20-30% reduction
CODEBLOCK12
Week 3: Lock In & Monitor
Target: Establish monitoring + reach 50%+ total reduction
CODEBLOCK13
Token Efficiency Scoring Rubric
After completing the 3-week roadmap, score your setup:
| Metric | 0 | 1 | 2 |
|---|
| Average session tokens | > 50K | 20-50K | < 20K |
| Skills lazy-loaded |
None | Some | All |
| Memory tiered correctly | No | Partially | Yes |
| Model routing applied | No | Ad hoc | Systematic |
| Context reviewed quarterly | No | Annually | Quarterly |
Score 8-10: Token-efficient operator. You're in the top 5% of AI users by cost.
Score 5-7: Good progress. Keep tightening.
Score 0-4: High burn rate. Revisit Week 1 of the roadmap.
Quick Reference: The 10 Highest-ROI Cuts
If you do nothing else, do these 10 things:
- 1. Archive memory older than 30 days
- Switch routine tasks from Opus/Sonnet to Haiku
- Lazy-load skills instead of always-on
- Compress system prompt (verbose → structured)
- Stop re-reading files in the same session
- Archive daily notes older than 14 days
- Create FOCUS.md and limit startup context to it
- Remove code files from context when not actively editing
- Summarize MCP tool outputs instead of keeping raw results
- Set model routing rules in AGENTS.md
Combined impact: 50-70% cost reduction for most users.
Example Session
User prompt:
"My Claude usage is $400/month and I don't know why. Help me cut it."
Agent response using this skill:
- 1. Runs Phase 1 Context Inventory (asks user to share what's in their setup)
- Estimates tokens per item using the cheatsheet
- Populates the Token Efficiency Matrix
- Identifies top 3 CUT items (likely: bloated MEMORY.md, eager skill loading, Opus overuse)
- Delivers Week 1 roadmap customized to their setup
- Projects: "Based on this, you should reach $150-200/month in 3 weeks"
Bundle Note
This skill is part of the AI Setup & Productivity Pack ($79 bundle):
- - MCP Server Setup Kit ($19)
- Agentic Loop Designer ($29)
- AI OS Blueprint ($39)
- Context Budget Optimizer ($19) — you are here
- Non-Technical Agent Quickstart ($9)
Save $36 with the full bundle. Built by @RemyClaw.
上下文预算优化器
框架:Token效率矩阵
价值每小时200美元的顾问时间。仅需19美元。
该技能的作用
审计你的智能体在每个上下文层中的Token使用情况,识别你在臃肿内容上浪费预算的地方,并生成一份包含具体实施步骤的3周成本削减路线图。
解决的问题: 每月AI成本达到200-500美元的重度用户,其上下文中通常存在60-70%的浪费。其中大部分是看不见的:系统提示中的过时文件、冗余的技能加载、过大的记忆文件、错误的模型选择。Token效率矩阵让这些浪费变得可见且可排序。
Token效率矩阵
一个四象限审计工具,根据成本(Token权重)和ROI(每个Token带来的价值)对每个上下文元素进行评分。高成本+低ROI = 优先削减。
矩阵
高ROI
│
保留 │ 优化
(高ROI, │ (高ROI,
低成本) │ 高成本)
│
低成本 ──────────────┼────────────────── 高成本
│
审计 │ 削减
(低ROI, │ (低ROI,
低成本) │ 高成本)
│
低ROI
各象限行动:
- - 保留: 不要动。它运行效率很高。
- 优化: 压缩或延迟加载。有价值,但成本高。
- 审计: 每季度审查。成本低所以不紧急,但应质疑其ROI。
- 削减: 立即删除。你在为无价值的东西付费。
第一阶段:上下文清单
在评分之前,先映射智能体上下文中的所有内容。
待审计的上下文层
层A:系统提示 / SOUL.md / 身份文件
层B:活跃技能(每次会话加载)
层C:记忆文件(MEMORY.md、每日笔记)
层D:启动时注入的项目文件
层E:上下文中的工具输出 / MCP响应
层F:聊天历史(保留在上下文中的对话轮次)
层G:读入上下文中的代码或数据文件
清单模板
对于上下文中的每个项目,填写以下内容:
| 项目 | 层 | 预估Token数 | 每日会话数 | 每日成本* | 价值(1-5分) |
|---|
| SOUL.md | A | | | | |
| MEMORY.md |
C |
| |
| |
| [技能1].md | B |
| |
| |
| [技能2].md | B |
| |
| |
| 每日笔记 | C |
| |
| |
| [项目文件] | D |
| |
| |
*每日成本 = (预估Token数 / 1M) × 模型费率 × 每日会话数
Token估算速查表:
- - 1页文本 ≈ 500-700 Token
- 1个SKILL.md文件 ≈ 800-2,000 Token
- 1个代码文件(100行)≈ 1,200-1,800 Token
- 1个MEMORY.md(维护良好)≈ 500-1,500 Token
- 1个MEMORY.md(被忽视/臃肿)≈ 3,000-8,000 Token
模型费率(2026年第一季度,近似值):
| 模型 | 每百万Token输入成本 |
|---|
| Claude Haiku 3.5 | ~$0.80 |
| Claude Sonnet 4 |
~$3.00 |
| Claude Opus 4 | ~$15.00 |
| GPT-4o mini | ~$0.15 |
| GPT-4o | ~$2.50 |
第二阶段:评分(Token效率矩阵)
对每个上下文项目进行评分:
成本评分(1-5分):
| 分数 | Token范围 | 描述 |
|---|
| 1 | < 200 Token | 极小——可忽略 |
| 2 |
200-500 Token | 轻量 |
| 3 | 500-1,500 Token | 中等 |
| 4 | 1,500-4,000 Token | 重量 |
| 5 | > 4,000 Token | 非常重 |
ROI评分(1-5分):
偶尔有用 |
| 3 | 大多数会话中中等有用 |
| 4 | 持续被引用,塑造输出 |
| 5 | 关键——没有它会话中断 |
矩阵定位:
- - 成本1-2,ROI 4-5 → 保留
- 成本4-5,ROI 4-5 → 优化
- 成本1-2,ROI 1-2 → 审计
- 成本4-5,ROI 1-2 → 削减
- 成本3,ROI 3 → 审计(边缘情况——每季度评估)
第三阶段:削减手册
削减(立即实施)
首先消除的项目:
□ 超过90天且无引用的旧记忆条目
□ 全局加载但仅偶尔使用的技能
□ 多个文件中的重复信息
□ 系统提示中的冗长模板
□ 注入文件中的注释代码
□ 包含在上下文中的调试日志
□ 仅需摘要时却包含的完整文件内容
削减目标: Token减少30-40%,质量零损失。
优化(第1-2周实施)
策略1:延迟加载
不要在启动时加载所有技能,仅在触发时加载。
之前(急切加载):
系统提示包含所有10个技能文件 → 每次会话15,000 Token
之后(延迟加载):
系统提示仅包含技能索引 → 500 Token
按需加载单个技能 → 需要时1,000 Token
净效果:每次会话减少14,000 Token(技能节省93%)
延迟加载实现:
markdown
SKILL-INDEX.md(500 Token,而非完整技能)
可用技能——需要时加载:
- - mcp-server-setup-kit:MCP连接设置
- agentic-loop-designer:构建自主循环
- context-budget-optimizer:Token成本削减
- [等]
使用技能:使用[技能名称]技能
策略2:记忆分层
并非所有记忆都同等重要。对其进行分层。
第1层(热):始终在上下文中——当前焦点、活跃项目、今日优先事项
目标:< 500 Token
文件:FOCUS.md
第2层(温):按需加载——历史决策、已完成项目
目标:< 2,000 Token
文件:MEMORY.md(已摘要)
第3层(冷):从不自动加载——旧每日笔记、已归档项目
存储:平面文件,可按需搜索
文件:memory/archive/
记忆分层实现:
- 1. 创建FOCUS.md(第1层)——仅本周优先事项
- 将超过14天的每日笔记归档到memory/archive/
- 每季度摘要MEMORY.md(移除已解决项目)
- 设置系统提示仅注入FOCUS.md + 最近7天的记忆
策略3:压缩模板
用压缩引用替换冗长内容。
之前(臃肿的系统提示部分):
David Flynn是一位创始人,居住在德克萨斯州奥斯汀。他经营一家名为
TechCorp的公司,为物流领域的中端市场公司构建B2B SaaS产品。
他从事这项工作已有8年,之前在麦肯锡工作。他喜欢直接沟通,
不喜欢废话。他最看重指标和ROI。他的团队有6人...
[300 Token]
之后(压缩后):
所有者:David Flynn | 奥斯汀 TX | TechCorp(B2B SaaS,物流,中端市场)
背景:8年创始人,前麦肯锡 | 团队:6人
风格:直接,指标优先,无废话
[40 Token —— 减少87%]
策略4:模型降级机会
大多数上下文密集型会话不需要旗舰模型。
降级决策树:
此任务是否需要多步推理?
├── 否 → 使用Haiku(成本降低80-90%)
└── 是 → 这是新问题吗?
├── 否(熟悉模式)→ 使用Sonnet
└── 是(真正复杂)→ 使用Opus
模型节省计算器:
| 切换 | Token成本降低 | 何时安全 |
|---|
| Opus → Sonnet | 80% | 大多数写作、分析、运营 |
| Sonnet → Haiku |
75% | 简单读取、状态检查、格式化 |
| Op