proxy-token-optimizer

# Proxy Token Optimizer Reduces LLM API costs for the openclaw-manager multi-tenant proxy platform through four strategies: 1. **Model-tier routing** — Route prompts to the cheapest capable model 2. **Heartbeat optimization** — Cheapest model + longer intervals for heartbeat calls 3. **Context lazy loading** — Load only the context files each prompt actually needs 4. **Platform usage analytics** — Real data from PostgreSQL, not estimates ## Why these strategies matter The openclaw-manager platform proxies LLM requests for multiple OpenClaw instances through providers like `zai-proxy`, `zai-coding-proxy`, and `kimi-coding-proxy`. Each provider offers models at different price points (e.g., `glm-4.7` vs `glm-4.7-flashx`). Without optimization, every request — including simple greetings and heartbeat pings — uses the default (expensive) model, and every session loads the full context regardless of need. These four strategies target the highest-impact cost drivers. ## Quick start All instance-side scripts run locally with no dependencies. Platform-side scripts need DB access. ```bash # Model routing — which model should handle this prompt? python3 scripts/model_router.py "thanks!" # → {"tier": "cheap", "recommended_model": "zai-proxy/glm-4.7-flashx"} # Context optimization — which files does this prompt need? python3 scripts/context_optimizer.py recommend "hi" # → {"context_level": "minimal", "recommended_files": ["SOUL.md", "IDENTITY.md"]} # Heartbeat config — generate openclaw.json patch python3 scripts/heartbeat_config.py patch # → {"agents": {"defaults": {"heartbeat": {"every": "55m", "model": "zai-proxy/glm-4.7-flashx"}}}} # Unified CLI (all commands in one place) python3 scripts/cli.py --help ``` ## Scripts reference ### Instance-side (pure local, no network, no DB) #### `scripts/model_router.py` Routes prompts to the right model tier based on complexity analysis. **Tier logic:** - **cheap** → `glm-4.7-flashx`: Greetings, acknowledgments, heartbeats, cron jobs, log parsing. Cost savings: 5-10x vs standard. - **standard** → `glm-4.7`: Code writing, debugging, explanations. Default for unclear prompts. - **premium** → `glm-4.7` (or `k2p5` for kimi): Architecture design, deep analysis, strategy planning. Supports Chinese and English patterns. Provider-aware — works with `zai-proxy`, `zai-coding-proxy`, and `kimi-coding-proxy`. ```bash python3 scripts/model_router.py "<prompt>" [provider] python3 scripts/model_router.py compare # show all provider models ``` #### `scripts/context_optimizer.py` Analyzes prompt complexity to recommend which context files to load, reducing unnecessary token consumption. **Context levels:** | Level | When | Files loaded | Token savings | |-------|------|-------------|---------------| | minimal | "hi", "thanks", short msgs | SOUL.md + IDENTITY.md (2) | ~80% | | standard | "write a function", normal work | + memory/TODAY.md + conditional | ~50% | | full | "design architecture", complex tasks | + MEMORY.md + all conditional | ~30% | Also generates an optimized `AGENTS.md` template with lazy-loading rules baked in: ```bash python3 scripts/context_optimizer.py recommend "<prompt>" python3 scripts/context_optimizer.py generate-agents # creates AGENTS.md.optimized ``` #### `scripts/heartbeat_config.py` Generates `openclaw.json` configuration patches for heartbeat optimization: - Forces heartbeat model to `glm-4.7-flashx` (cheapest available) - Sets interval to 55 minutes (keeps prompt cache warm within 1-hour TTL, avoids cache rebuild cost) ```bash python3 scripts/heartbeat_config.py recommend [cache_ttl_minutes] python3 scripts/heartbeat_config.py patch # output JSON patch for openclaw.json ``` ### Platform-side (requires DB connection) These scripts query the `usage_records` PostgreSQL table for real data. Run from the openclaw-manager project root with the virtualenv activated. #### `scripts/usage_report.py` Generates usage reports from actual database records — not estimates. ```bash python3 scripts/usage_report.py overview [days] # platform-wide summary python3 scripts/usage_report.py instance <name> [days] # single instance detail ``` **Overview includes:** total calls/tokens, per-provider breakdown, per-model breakdown, top 10 instances by consumption, 7-day daily trend. **Instance report includes:** per-model distribution, daily trend, lifetime totals. #### `scripts/quota_advisor.py` Compares actual 24-hour usage against quota plan limits to find mismatches: - **Wasteful:** Usage below 20% of plan limit → suggest downgrade - **Throttled:** Usage above 80% of plan limit → suggest upgrade ```bash python3 scripts/quota_advisor.py analyze # check all instances python3 scripts/quota_advisor.py plans # show available quota plans ``` ### Unified CLI `scripts/cli.py` wraps all the above into a single entry point: ```bash python3 scripts/cli.py route "<prompt>" # model routing python3 scripts/cli.py context "<prompt>" # context recommendation python3 scripts/cli.py generate-agents # generate AGENTS.md python3 scripts/cli.py heartbeat # heartbeat config python3 scripts/cli.py overview [days] # platform usage (needs DB) python3 scripts/cli.py report <name> [days] # instance report (needs DB) python3 scripts/cli.py advisor # quota advice (needs DB) ``` ## Project integration points This skill works with existing openclaw-manager infrastructure: | Component | File | How this skill uses it | |-----------|------|----------------------| | Provider config | `config/model.yaml` | Model names/endpoints for routing | | Proxy routing | `config_service.py` | Where `_inject_proxy_providers()` registers models | | Usage recording | `proxy_common/usage_recorder.py` | Source of real usage data | | Quota plans | `config/llm_proxy.yaml` | Plan definitions for quota advisor | | Instance model | `app/models.py` | Instance metadata for reports | ## Expected savings | Strategy | Mechanism | Impact | |----------|-----------|--------| | Context lazy loading | Fewer tokens per request | 50-80% context reduction | | Model routing (flashx) | Lower per-token price | 5-10x on simple tasks | | Heartbeat → flashx | Lower heartbeat cost | Significant per-instance savings | | Heartbeat interval 55min | Fewer API calls | ~45% fewer heartbeat calls |

proxy-token-optimizer

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

proxy-token-optimizer