TokenKiller (Universal Throttling)
Goal
Systematically reduce token consumption without noticeably lowering success rate, applicable to agents with multiple capabilities (search/coding/debugging/testing/docs).
Task Complexity Assessment
Before setting budgets, assess task complexity:
| Complexity | Criteria | Tool Budget | Output Budget |
|---|
| Simple | Single file modification, single-point localization, clear requirements | ≤3 calls | ≤50 lines |
| Medium |
Across 2-3 files, needs simple exploration, relatively clear requirements | ≤6 calls | ≤120 lines |
| Complex | Cross-module refactoring, multi-step debugging, unclear requirements | ≤10 calls | ≤200 lines |
Extension Mechanism (Soft Warning): When budget is about to run out but task is incomplete:
- 1. Output warning: INLINECODE0
- Continue execution, but switch to more conservative strategy
- User can interrupt or request more detailed output at any time
Default Working Mode (Balanced)
Global Hard Rules (Must Follow)
- - Goal First, Evidence Later: State the goal in one sentence (L0) first, then decide if evidence is needed (L2/L3).
- Three-Question Limit: When clarification is needed, ask at most 3 questions at a time; otherwise proceed with "default assumptions" and mark replaceable points.
- Progressive Disclosure: By default, only fetch "minimum necessary information"; never dump large files/full logs directly into context.
- Diff-First: Prioritize outputting patches/changes/command and result summaries; avoid reposting entire files.
- Deduped References: Information already seen should only be briefly referenced, not pasted again.
Budget Gate (Budget + Gate)
At the start of each task, assess complexity and set corresponding budget (see above "Task Complexity Assessment"), then execute gates:
- - Tool Call Budget: Set by complexity (Simple ≤3, Medium ≤6, Complex ≤10).
- Read Budget: Single files read in full by default; large files >200 lines only read hit segments or in sections.
- Output Budget: Set by complexity (Simple ≤50 lines, Medium ≤120 lines, Complex ≤200 lines).
If any gate is exceeded:
- - First narrow scope (path/file/module) → Then switch search strategy → Finally expand reading and output.
Token Consumption Self-Check
High-Consumption Behaviors (Avoid)
- - Reading >500 line files in full
- Outputting complete file contents (should output diff)
- Repeatedly pasting the same code/log
- Listing entire directory trees
- Outputting lengthy explanatory text
Self-Check Timing
After every 3 tool calls, quickly self-check:
- - Am I currently at L0-L2 level?
- Is there duplicate information?
- Is output exceeding necessary length?
Information Layers (L0-L3)
- - L0: One-sentence goal (required)
- L1: At most 3 hard constraints (required)
- L2: Evidence summary (file path + line number / key command output lines / key config items)
- L3: Full long content (only pull in specific scenarios, see below "L3 Pull Scenarios")
Default output and context stay at L0-L2.
L3 Pull Scenarios (Explicit)
Only pull L3 (full content) in these scenarios:
- 1. Code Modification: When exact indentation/format matching is needed, read target function's complete code
- Config Debugging: When config items are interdependent, need to see complete config block
- Error Analysis: When error message is incomplete, need complete stack trace or context
- User Explicit Request: User requests to see full content
Decision Flow:
L2 Evidence → Attempt to proceed → Fail → Determine if L3 is needed → Pull minimum necessary range
Multi-Skill Collaboration
When this Skill is activated alongside other Skills:
Priority Rules
- - Functional Skills First: Specific rules of functional skills like
pdf, xlsx take precedence - TokenKiller as Constraint Layer: During other skill execution, continuously apply budget and layer rules
- User Priority on Conflict: User's explicitly requested output format/content takes precedence over throttling rules
Collaboration Mode
CODEBLOCK0
Workflow (General)
1) Task Entry (Any Domain)
- 1. Produce L0 + L1 (quickly infer if user didn't provide)
- Choose strategy (search/direct modification/verify first)
- Execute minimal action
- Immediately verify (cheapest verification first)
- Summarize: only key conclusion + 1 next step
2) Search/Exploration (Priority Domain)
Priority:
- 1. Filename/Path (Glob)
- Exact String (Grep)
- Semantic Search (SemanticSearch)
- Read File (Read, by sections/line ranges)
Rules:
- - Only read near hit points (±20 lines) or target function/component related paragraphs
- Don't read through entire repository without localization
3) Coding/Refactoring
Rules:
- - Minimal change surface first: if 1 file can be changed, don't change 5
- Avoid "rewrite everything"; prioritize reusing existing structure
- After modification, immediately run cheapest verification (tsc/build/lint)
- Only show key diffs (at most 1-3 code references)
4) Debugging/Troubleshooting
Rules:
- - First list 3 highest probability hypotheses (sorted by information gain)
- Each time verify only 1 hypothesis, and only collect necessary evidence
- Logs only take: error line, stack top, related config, reproduction command (rest summarized)
5) Testing/Verification
Priority (from cheap to expensive):
- 1. lint / typecheck
- build
- unit test
- e2e / browser automation
When failed, only append "diff information", don't repost full output.
6) Docs/Summary
Rules:
- - Default to "short summary + next steps"
- Don't restate user's original words; use structured point references
- When docs are needed, use progressive disclosure: outline/points first, then expand details
Output Template (Default)
Use the following structure, unless user explicitly requests other format:
- - Conclusion: One sentence
- Evidence: 2-5 items (path/line number/key command output)
- Changes/Actions: What was done (at most 5 items)
- Next Step: 1 item (most valuable next step)
Trigger Words (Recommended Auto-Enable)
Force enable this Skill when user mentions any of the following keywords/scenarios:
- - "waste token / save token / cost / context too long / log too long / repo too large / multi-step / agent"
TokenKiller(通用节流机制)
目标
在不明显降低成功率的前提下,系统性地减少令牌消耗,适用于具备多种能力(搜索/编码/调试/测试/文档)的智能体。
任务复杂度评估
在设定预算前,先评估任务复杂度:
| 复杂度 | 判定标准 | 工具调用预算 | 输出预算 |
|---|
| 简单 | 单文件修改、单点定位、需求明确 | ≤3次调用 | ≤50行 |
| 中等 |
涉及2-3个文件、需简单探索、需求相对明确 | ≤6次调用 | ≤120行 |
| 复杂 | 跨模块重构、多步调试、需求不明确 | ≤10次调用 | ≤200行 |
扩展机制(软警告):当预算即将耗尽但任务未完成时:
- 1. 输出警告:[TokenKiller] 预算即将耗尽,当前进度 X/Y,剩余工作:...
- 继续执行,但切换至更保守的策略
- 用户可随时中断或要求更详细的输出
默认工作模式(均衡型)
全局硬性规则(必须遵守)
- - 目标优先,证据在后:先一句话说明目标(L0),再决定是否需要证据(L2/L3)。
- 三问限制:需要澄清时,一次最多问3个问题;否则按默认假设继续,并标记可替换点。
- 渐进式披露:默认只获取最小必要信息;绝不将大文件/完整日志直接倒入上下文。
- 差异优先:优先输出补丁/变更/命令及结果摘要;避免重新贴出整个文件。
- 去重引用:已看到的信息仅简要引用,不重复粘贴。
预算门控(预算 + 门控)
每个任务开始时,评估复杂度并设定相应预算(见上方任务复杂度评估),然后执行门控:
- - 工具调用预算:按复杂度设定(简单≤3,中等≤6,复杂≤10)。
- 读取预算:单文件默认完整读取;超过200行的大文件仅读取命中片段或分段读取。
- 输出预算:按复杂度设定(简单≤50行,中等≤120行,复杂≤200行)。
若超出任一预算门控:
- - 先缩小范围(路径/文件/模块)→ 再切换搜索策略 → 最后扩大读取和输出。
令牌消耗自查
高消耗行为(需避免)
- - 完整读取超过500行的文件
- 输出完整的文件内容(应输出差异)
- 重复粘贴相同的代码/日志
- 列出完整的目录树
- 输出冗长的解释性文本
自查时机
每3次工具调用后,快速自查:
- - 我当前处于L0-L2层级吗?
- 是否存在重复信息?
- 输出是否超出必要长度?
信息层级(L0-L3)
- - L0:一句话目标(必需)
- L1:最多3条硬性约束(必需)
- L2:证据摘要(文件路径+行号/关键命令输出行/关键配置项)
- L3:完整长内容(仅在特定场景下拉取,见下方L3拉取场景)
默认输出和上下文保持在L0-L2。
L3拉取场景(明确)
仅在以下场景下拉取L3(完整内容):
- 1. 代码修改:需要精确匹配缩进/格式时,读取目标函数的完整代码
- 配置调试:配置项相互依赖时,需查看完整配置块
- 错误分析:错误信息不完整时,需完整堆栈跟踪或上下文
- 用户明确要求:用户要求查看完整内容
决策流程:
L2证据 → 尝试推进 → 失败 → 判断是否需要L3 → 拉取最小必要范围
多技能协作
当此技能与其他技能同时激活时:
优先级规则
- - 功能性技能优先:如pdf、xlsx等技能的具体规则优先
- TokenKiller作为约束层:其他技能执行期间,持续应用预算和层级规则
- 用户优先级高于冲突:用户明确要求的输出格式/内容优先于节流规则
协作模式
[用户请求] → [功能性技能处理] → [TokenKiller约束输出]
工作流程(通用)
1)任务入口(任何领域)
- 1. 生成L0 + L1(若用户未提供则快速推断)
- 选择策略(搜索/直接修改/先验证)
- 执行最小操作
- 立即验证(先做最便宜的验证)
- 总结:仅关键结论 + 1个下一步
2)搜索/探索(优先领域)
优先级:
- 1. 文件名/路径(Glob)
- 精确字符串(Grep)
- 语义搜索(SemanticSearch)
- 读取文件(Read,按段落/行范围)
规则:
- - 仅读取命中点附近(±20行)或目标函数/组件相关段落
- 未定位前不完整读取整个仓库
3)编码/重构
规则:
- - 优先最小变更面:若能改1个文件,就不改5个
- 避免重写一切;优先复用现有结构
- 修改后立即运行最便宜的验证(tsc/build/lint)
- 仅展示关键差异(最多1-3处代码引用)
4)调试/故障排查
规则:
- - 先列出3个最高概率假设(按信息增益排序)
- 每次仅验证1个假设,且仅收集必要证据
- 日志仅取:错误行、堆栈顶部、相关配置、复现命令(其余总结)
5)测试/验证
优先级(从便宜到昂贵):
- 1. lint / typecheck
- build
- 单元测试
- e2e / 浏览器自动化
失败时,仅附加差异信息,不重新贴出完整输出。
6)文档/总结
规则:
- - 默认简短总结 + 下一步
- 不重复用户原话;使用结构化要点引用
- 需要文档时,采用渐进式披露:先大纲/要点,再展开细节
输出模板(默认)
除非用户明确要求其他格式,否则使用以下结构:
- - 结论:一句话
- 证据:2-5项(路径/行号/关键命令输出)
- 变更/操作:已完成的操作(最多5项)
- 下一步:1项(最有价值的下一步)
触发词(建议自动启用)
当用户提及以下任一关键词/场景时,强制启用此技能:
- - 浪费令牌 / 节省令牌 / 成本 / 上下文太长 / 日志太长 / 仓库太大 / 多步 / 智能体