Swarm — Cut Your LLM Costs by 200x
Turn your expensive model into an affordable daily driver. Offload the boring stuff to Gemini Flash workers — parallel, batch, research — at a fraction of the cost.
At a Glance
| 30 tasks via | Time | Cost |
|---|
| Opus (sequential) | ~30s | ~$0.50 |
| Swarm (parallel) |
~1s | ~$0.003 |
When to Use
Swarm is ideal for:
- - 3+ independent tasks (research, summaries, comparisons)
- Comparing or researching multiple subjects
- Multiple URLs to fetch/analyze
- Batch processing (documents, entities, facts)
- Complex analysis needing multiple perspectives → use chain
Quick Reference
CODEBLOCK0
Execution Modes
Parallel (v1.0)
N prompts → N workers simultaneously. Best for independent tasks.
CODEBLOCK1
Research (v1.1)
Multi-phase: search → fetch → analyze. Uses Google Search grounding.
CODEBLOCK2
Chain (v1.3) — Refinement Pipelines
Data flows through multiple stages, each with a different perspective/filter. Stages run in sequence; tasks within a stage run in parallel.
Stage modes:
- -
parallel — N inputs → N workers (same perspective) - INLINECODE1 — merged input → 1 worker
- INLINECODE2 — 1 input → N workers with DIFFERENT perspectives
- INLINECODE3 — N inputs → 1 synthesized output
Auto-chain — describe what you want, get an optimal pipeline:
CODEBLOCK3
Manual chain:
CODEBLOCK4
Depth presets: quick (2 stages), standard (4), deep (6), exhaustive (8)
Built-in perspectives: extractor, filter, enricher, analyst, synthesizer, challenger, optimizer, strategist, researcher, critic
Preview without executing:
CODEBLOCK5
Benchmark (v1.3)
Compare single vs parallel vs chain on the same task with LLM-as-judge scoring.
CODEBLOCK6
Scores on 6 FLASK dimensions: accuracy (2x weight), depth (1.5x), completeness, coherence, actionability (1.5x), nuance.
Capabilities Discovery (v1.3)
Lets the orchestrator discover what execution modes are available:
CODEBLOCK7
Prompt Cache (v1.3.2)
LRU cache for LLM responses. 212x speedup on cache hits (parallel), 514x on chains.
- - Keyed by hash of instruction + input + perspective
- 500 entries max, 1 hour TTL
- Skips web search tasks (need fresh data)
- Persists to disk across daemon restarts
- Per-task bypass: set INLINECODE8
CODEBLOCK8
Cache stats show in swarm status.
Stage Retry (v1.3.2)
If tasks fail within a chain stage, only the failed tasks get retried (not the whole stage). Default: 1 retry. Configurable per-phase via phase.retries or globally via options.stageRetries.
Cost Tracking (v1.3.1)
All endpoints return cost data in their complete event:
- -
session — current daemon session totals - INLINECODE14 — persisted across restarts, accumulates all day
CODEBLOCK9
Web Search (v1.1)
Workers search the live web via Google Search grounding (Gemini only, no extra cost).
CODEBLOCK10
JavaScript API
CODEBLOCK11
Daemon Management
CODEBLOCK12
Performance (v1.3.2)
| Mode | Tasks | Time | Notes |
|---|
| Parallel (simple) | 5 | ~700ms | 142ms/task effective |
| Parallel (stress) |
10 | ~1.2s | 123ms/task effective |
| Chain (standard) | 5 | ~14s | 3-stage multi-perspective |
| Chain (quick) | 2 | ~3s | 2-stage extract+synthesize |
| Cache hit | any | ~3-5ms | 200-500x speedup |
| Research (web) | 2 | ~15s | Google grounding latency |
Config
Location: INLINECODE15
CODEBLOCK13
Troubleshooting
| Issue | Fix |
|---|
| Daemon not running | INLINECODE16 |
| No API key |
Set
GEMINI_API_KEY or run
npm run setup |
| Rate limited | Lower
max_concurrent_api in config |
| Web search not working | Ensure provider is gemini + web_search.enabled |
| Cache stale results |
curl -X DELETE http://localhost:9999/cache |
| Chain too slow | Use
depth: "quick" or check context size |
Structured Output (v1.3.7)
Force JSON output with schema validation — zero parse failures on structured tasks.
CODEBLOCK14
Built-in schemas: entities, summary, comparison, actions, classification, INLINECODE27
Uses Gemini's native response_mime_type: application/json + responseSchema for guaranteed JSON output. Includes schema validation on the response.
Majority Voting (v1.3.7)
Same prompt → N parallel executions → pick the best answer. Higher accuracy on factual/analytical tasks.
CODEBLOCK15
Strategies:
- -
judge — LLM scores all candidates on accuracy/completeness/clarity/actionability, picks winner (N+1 calls) - INLINECODE31 — Jaccard word-set similarity, picks consensus answer (N calls, zero extra cost)
- INLINECODE32 — Picks longest response as heuristic for thoroughness (N calls, zero extra cost)
When to use: Factual questions, critical decisions, or any task where accuracy > speed.
| Strategy | Calls | Extra Cost | Quality |
|---|
| similarity | N | $0 | Good (consensus) |
| longest |
N | $0 | Decent (heuristic) |
| judge | N+1 | ~$0.0001 | Best (LLM-scored) |
Self-Reflection (v1.3.5)
Optional critic pass after chain/skeleton output. Scores 5 dimensions, auto-refines if below threshold.
CODEBLOCK16
Proven: improved weak output from 5.0 → 7.6 avg score. Skeleton + reflect scored 9.4/10.
Skeleton-of-Thought (v1.3.6)
Generate outline → expand each section in parallel → merge into coherent document. Best for long-form content.
CODEBLOCK17
Performance: 14,478 chars in 21s (675 chars/sec) — 5.1x more content than chain at 2.9x higher throughput.
| Metric | Chain | Skeleton-of-Thought | Winner |
|---|
| Output size | 2,856 chars | 14,478 chars | SoT (5.1x) |
| Throughput |
234 chars/sec | 675 chars/sec | SoT (2.9x) |
| Duration | 12s | 21s | Chain (faster) |
| Quality (w/ reflect) | ~7-8/10 | 9.4/10 | SoT |
When to use what:
- - SoT → long-form content, reports, guides, docs (anything with natural sections)
- Chain → analysis, research, adversarial review (anything needing multiple perspectives)
- Parallel → independent tasks, batch processing
- Structured → entity extraction, classification, any task needing reliable JSON
- Voting → factual accuracy, critical decisions, consensus-building
API Endpoints
| Method | Path | Description |
|---|
| GET | /health | Health check |
| GET |
/status | Detailed status + cost + cache |
| GET | /capabilities | Discover execution modes |
| POST | /parallel | Execute N prompts in parallel |
| POST | /research | Multi-phase web research |
| POST | /skeleton | Skeleton-of-Thought (outline → expand → merge) |
| POST | /chain | Manual chain pipeline |
| POST | /chain/auto | Auto-build + execute chain |
| POST | /chain/preview | Preview chain without executing |
| POST | /chain/template | Execute pre-built template |
| POST | /structured | Forced JSON with schema validation |
| GET | /structured/schemas | List built-in schemas |
| POST | /vote | Majority voting (best-of-N) |
| POST | /benchmark | Quality comparison test |
| GET | /templates | List chain templates |
| GET | /cache | Cache statistics |
| DELETE | /cache | Clear cache |
Cost Comparison
| Model | Cost per 1M tokens | Relative |
|---|
| Claude Opus 4 | ~$15 input / $75 output | 1x |
| GPT-4o |
~$2.50 input / $10 output | ~7x cheaper |
| Gemini Flash | ~$0.075 input / $0.30 output |
200x cheaper |
Cache hits are essentially free (~3-5ms, no API call).
Swarm — 将您的LLM成本降低200倍
将您昂贵的模型转变为日常可负担的工具。将繁琐任务交给Gemini Flash工作节点——并行处理、批量处理、研究——成本仅为一小部分。
概览
| 30项任务通过 | 时间 | 成本 |
|---|
| Opus(顺序执行) | ~30秒 | ~$0.50 |
| Swarm(并行执行) |
~1秒 | ~$0.003 |
使用场景
Swarm适用于:
- - 3个以上独立任务(研究、总结、对比)
- 比较或研究多个主题
- 多个需要获取/分析的URL
- 批量处理(文档、实体、事实)
- 需要多视角的复杂分析 → 使用链式处理
快速参考
bash
检查守护进程(每次会话执行)
swarm status
如果未运行则启动
swarm start
并行提示
swarm parallel 什么是X? 什么是Y? 什么是Z?
研究多个主题
swarm research OpenAI Anthropic Mistral --topic AI安全
发现能力
swarm capabilities
执行模式
并行模式 (v1.0)
N个提示 → N个工作节点同时执行。最适合独立任务。
bash
swarm parallel 提示1 提示2 提示3
研究模式 (v1.1)
多阶段:搜索 → 获取 → 分析。使用Google搜索基础。
bash
swarm research Buildertrend Jobber --topic 2026年定价
链式模式 (v1.3) — 优化流水线
数据流经多个阶段,每个阶段有不同的视角/过滤器。阶段按顺序执行;阶段内的任务并行执行。
阶段模式:
- - parallel — N个输入 → N个工作节点(相同视角)
- single — 合并输入 → 1个工作节点
- fan-out — 1个输入 → N个工作节点(不同视角)
- reduce — N个输入 → 1个综合输出
自动链式 — 描述您的需求,获得最优流水线:
bash
curl -X POST http://localhost:9999/chain/auto \
-d {task:寻找商业机会,data:...市场数据...,depth:standard}
手动链式:
bash
swarm chain pipeline.json
或
echo {stages:[...]} | swarm chain --stdin
深度预设: quick(2阶段)、standard(4阶段)、deep(6阶段)、exhaustive(8阶段)
内置视角: 提取器、过滤器、丰富器、分析器、综合器、挑战者、优化器、策略师、研究员、评论家
预览而不执行:
bash
curl -X POST http://localhost:9999/chain/preview \
-d {task:...,depth:standard}
基准测试 (v1.3)
使用LLM作为评委评分,比较单次、并行和链式模式在同一任务上的表现。
bash
curl -X POST http://localhost:9999/benchmark \
-d {task:分析X,data:...,depth:standard}
在6个FLASK维度上评分:准确性(2倍权重)、深度(1.5倍)、完整性、连贯性、可操作性(1.5倍)、细微差别。
能力发现 (v1.3)
让编排器发现可用的执行模式:
bash
swarm capabilities
或
curl http://localhost:9999/capabilities
提示缓存 (v1.3.2)
LLM响应的LRU缓存。缓存命中时加速212倍(并行模式),链式模式加速514倍。
- - 键值由指令+输入+视角的哈希值决定
- 最多500条,TTL为1小时
- 跳过网络搜索任务(需要最新数据)
- 守护进程重启后持久化到磁盘
- 按任务绕过:设置 task.cache = false
bash
查看缓存统计
curl http://localhost:9999/cache
清除缓存
curl -X DELETE http://localhost:9999/cache
缓存统计显示在 swarm status 中。
阶段重试 (v1.3.2)
如果链式阶段内的任务失败,仅重试失败的任务(而非整个阶段)。默认:1次重试。可通过 phase.retries 按阶段配置,或通过 options.stageRetries 全局配置。
成本追踪 (v1.3.1)
所有端点在其 complete 事件中返回成本数据:
- - session — 当前守护进程会话总计
- daily — 跨重启持久化,全天累积
bash
swarm status # 显示会话+每日成本
swarm savings # 月度节省报告
网络搜索 (v1.1)
工作节点通过Google搜索基础搜索实时网络(仅限Gemini,无额外成本)。
bash
研究默认使用网络搜索
swarm research 主题 --topic 角度
并行模式带网络搜索
curl -X POST http://localhost:9999/parallel \
-d {prompts:[X的当前价格?],options:{webSearch:true}}
JavaScript API
javascript
const { parallel, research } = require(~/clawd/skills/node-scaling/lib);
const { SwarmClient } = require(~/clawd/skills/node-scaling/lib/client);
// 简单并行
const result = await parallel([提示1, 提示2, 提示3]);
// 带流式传输的客户端
const client = new SwarmClient();
for await (const event of client.parallel(prompts)) { ... }
for await (const event of client.research(subjects, topic)) { ... }
// 链式
const result = await client.chainSync({ task, data, depth });
守护进程管理
bash
swarm start # 启动守护进程(后台)
swarm stop # 停止守护进程
swarm status # 状态、成本、缓存统计
swarm restart # 重启守护进程
swarm savings # 月度节省报告
swarm logs [N] # 守护进程日志的最后N行
性能 (v1.3.2)
| 模式 | 任务数 | 时间 | 备注 |
|---|
| 并行(简单) | 5 | ~700ms | 每任务有效142ms |
| 并行(压力) |
10 | ~1.2s | 每任务有效123ms |
| 链式(标准) | 5 | ~14s | 3阶段多视角 |
| 链式(快速) | 2 | ~3s | 2阶段提取+综合 |
| 缓存命中 | 任意 | ~3-5ms | 加速200-500倍 |
| 研究(网络) | 2 | ~15s | Google基础延迟 |
配置
位置:~/.config/clawdbot/node-scaling.yaml
yaml
node_scaling:
enabled: true
limits:
max_nodes: 16
maxconcurrentapi: 16
provider:
name: gemini
model: gemini-2.0-flash
web_search:
enabled: true
parallel_default: false
cost:
maxdailyspend: 10.00
故障排除
| 问题 | 解决方法 |
|---|
| 守护进程未运行 | swarm start |
| 无API密钥 |
设置 GEMINI
APIKEY 或运行 npm run setup |
| 速率限制 | 在配置中降低 max
concurrentapi |
| 网络搜索不工作 | 确保provider为gemini且web_search.enabled为true |
| 缓存结果过时 | curl -X DELETE http://localhost:9999/cache |
| 链式太慢 | 使用 depth: quick 或检查上下文大小 |
结构化输出 (v1.3.7)
强制JSON输出并带模式验证——结构化任务零解析失败。
bash
使用内置模式
curl -X POST http://localhost:9999/structured \
-d {prompt:从以下内容提取实体:蒂姆·库克发布了iPhone 17,schema:entities}
使用自定义模式
curl -X POST http://localhost:9999/structured \
-d {prompt:分类此文本,data:...,schema:{type:object,properties:{category:{type:string}}}}
JSON模式(无模式,仅强制JSON)
curl -X POST http://localhost:9999/structured \