Swarm — Cut Your LLM Costs by 200x

Turn your expensive model into an affordable daily driver. Offload the boring stuff to Gemini Flash workers — parallel, batch, research — at a fraction of the cost.

At a Glance

30 tasks via	Time	Cost
Opus (sequential)	~30s	~$0.50
Swarm (parallel)

~1s | ~$0.003 |

When to Use

Swarm is ideal for:

- 3+ independent tasks (research, summaries, comparisons)
Comparing or researching multiple subjects
Multiple URLs to fetch/analyze
Batch processing (documents, entities, facts)
Complex analysis needing multiple perspectives → use chain

Quick Reference

CODEBLOCK0

Execution Modes

Parallel (v1.0)

N prompts → N workers simultaneously. Best for independent tasks.

CODEBLOCK1

Research (v1.1)

Multi-phase: search → fetch → analyze. Uses Google Search grounding.

CODEBLOCK2

Chain (v1.3) — Refinement Pipelines

Data flows through multiple stages, each with a different perspective/filter. Stages run in sequence; tasks within a stage run in parallel.

Stage modes:

- parallel — N inputs → N workers (same perspective)
INLINECODE1 — merged input → 1 worker
INLINECODE2 — 1 input → N workers with DIFFERENT perspectives
INLINECODE3 — N inputs → 1 synthesized output

Auto-chain — describe what you want, get an optimal pipeline:
CODEBLOCK3

Manual chain:
CODEBLOCK4

Depth presets: quick (2 stages), standard (4), deep (6), exhaustive (8)

Built-in perspectives: extractor, filter, enricher, analyst, synthesizer, challenger, optimizer, strategist, researcher, critic

Preview without executing:
CODEBLOCK5

Benchmark (v1.3)

Compare single vs parallel vs chain on the same task with LLM-as-judge scoring.

CODEBLOCK6

Scores on 6 FLASK dimensions: accuracy (2x weight), depth (1.5x), completeness, coherence, actionability (1.5x), nuance.

Capabilities Discovery (v1.3)

Lets the orchestrator discover what execution modes are available: CODEBLOCK7

Prompt Cache (v1.3.2)

LRU cache for LLM responses. 212x speedup on cache hits (parallel), 514x on chains.

- Keyed by hash of instruction + input + perspective
500 entries max, 1 hour TTL
Skips web search tasks (need fresh data)
Persists to disk across daemon restarts
Per-task bypass: set INLINECODE8

CODEBLOCK8

Cache stats show in swarm status.

Stage Retry (v1.3.2)

If tasks fail within a chain stage, only the failed tasks get retried (not the whole stage). Default: 1 retry. Configurable per-phase via phase.retries or globally via options.stageRetries.

Cost Tracking (v1.3.1)

All endpoints return cost data in their complete event:

- session — current daemon session totals
INLINECODE14 — persisted across restarts, accumulates all day

CODEBLOCK9

Web Search (v1.1)

Workers search the live web via Google Search grounding (Gemini only, no extra cost).

CODEBLOCK10

JavaScript API

CODEBLOCK11

Daemon Management

CODEBLOCK12

Performance (v1.3.2)

Mode	Tasks	Time	Notes
Parallel (simple)	5	~700ms	142ms/task effective
Parallel (stress)

10 | ~1.2s | 123ms/task effective | | Chain (standard) | 5 | ~14s | 3-stage multi-perspective | | Chain (quick) | 2 | ~3s | 2-stage extract+synthesize | | Cache hit | any | ~3-5ms | 200-500x speedup | | Research (web) | 2 | ~15s | Google grounding latency |

Config

Location: INLINECODE15

CODEBLOCK13

Troubleshooting

Issue	Fix
Daemon not running	INLINECODE16
No API key

Structured Output (v1.3.7)

Force JSON output with schema validation — zero parse failures on structured tasks.

CODEBLOCK14

Built-in schemas: entities, summary, comparison, actions, classification, INLINECODE27

Uses Gemini's native response_mime_type: application/json + responseSchema for guaranteed JSON output. Includes schema validation on the response.

Majority Voting (v1.3.7)

Same prompt → N parallel executions → pick the best answer. Higher accuracy on factual/analytical tasks.

CODEBLOCK15

Strategies:

- judge — LLM scores all candidates on accuracy/completeness/clarity/actionability, picks winner (N+1 calls)
INLINECODE31 — Jaccard word-set similarity, picks consensus answer (N calls, zero extra cost)
INLINECODE32 — Picks longest response as heuristic for thoroughness (N calls, zero extra cost)

When to use: Factual questions, critical decisions, or any task where accuracy > speed.

Strategy	Calls	Extra Cost	Quality
similarity	N	$0	Good (consensus)
longest

N | $0 | Decent (heuristic) |
| judge | N+1 | ~$0.0001 | Best (LLM-scored) |

Self-Reflection (v1.3.5)

Optional critic pass after chain/skeleton output. Scores 5 dimensions, auto-refines if below threshold.

CODEBLOCK16

Proven: improved weak output from 5.0 → 7.6 avg score. Skeleton + reflect scored 9.4/10.

Skeleton-of-Thought (v1.3.6)

Generate outline → expand each section in parallel → merge into coherent document. Best for long-form content.

CODEBLOCK17

Performance: 14,478 chars in 21s (675 chars/sec) — 5.1x more content than chain at 2.9x higher throughput.

Metric	Chain	Skeleton-of-Thought	Winner
Output size	2,856 chars	14,478 chars	SoT (5.1x)
Throughput

234 chars/sec | 675 chars/sec | SoT (2.9x) |
| Duration | 12s | 21s | Chain (faster) |
| Quality (w/ reflect) | ~7-8/10 | 9.4/10 | SoT |

When to use what:

- SoT → long-form content, reports, guides, docs (anything with natural sections)
Chain → analysis, research, adversarial review (anything needing multiple perspectives)
Parallel → independent tasks, batch processing
Structured → entity extraction, classification, any task needing reliable JSON
Voting → factual accuracy, critical decisions, consensus-building

API Endpoints

Method	Path	Description
GET	/health	Health check
GET

Cost Comparison

Model	Cost per 1M tokens	Relative
Claude Opus 4	~$15 input / $75 output	1x
GPT-4o

Cache hits are essentially free (~3-5ms, no API call).

Swarm — 将您的LLM成本降低200倍

将您昂贵的模型转变为日常可负担的工具。将繁琐任务交给Gemini Flash工作节点——并行处理、批量处理、研究——成本仅为一小部分。

概览

30项任务通过	时间	成本
Opus（顺序执行）	~30秒	~$0.50
Swarm（并行执行）

~1秒 | ~$0.003 |

使用场景

Swarm适用于：

- 3个以上独立任务（研究、总结、对比）
比较或研究多个主题
多个需要获取/分析的URL
批量处理（文档、实体、事实）
需要多视角的复杂分析 → 使用链式处理

快速参考

bash

检查守护进程（每次会话执行）

swarm status

如果未运行则启动

swarm start

并行提示

swarm parallel 什么是X？什么是Y？什么是Z？

研究多个主题

swarm research OpenAI Anthropic Mistral --topic AI安全

发现能力

swarm capabilities

执行模式

并行模式 (v1.0)

N个提示 → N个工作节点同时执行。最适合独立任务。

bash
swarm parallel 提示1 提示2 提示3

研究模式 (v1.1)

多阶段：搜索 → 获取 → 分析。使用Google搜索基础。

bash
swarm research Buildertrend Jobber --topic 2026年定价

链式模式 (v1.3) — 优化流水线

数据流经多个阶段，每个阶段有不同的视角/过滤器。阶段按顺序执行；阶段内的任务并行执行。

阶段模式：

- parallel — N个输入 → N个工作节点（相同视角）
single — 合并输入 → 1个工作节点
fan-out — 1个输入 → N个工作节点（不同视角）
reduce — N个输入 → 1个综合输出

自动链式 — 描述您的需求，获得最优流水线：
bash
curl -X POST http://localhost:9999/chain/auto \
-d {task:寻找商业机会,data:...市场数据...,depth:standard}

手动链式：
bash
swarm chain pipeline.json

或

echo {stages:[...]} | swarm chain --stdin

深度预设： quick（2阶段）、standard（4阶段）、deep（6阶段）、exhaustive（8阶段）

内置视角： 提取器、过滤器、丰富器、分析器、综合器、挑战者、优化器、策略师、研究员、评论家

预览而不执行：
bash
curl -X POST http://localhost:9999/chain/preview \
-d {task:...,depth:standard}

基准测试 (v1.3)

使用LLM作为评委评分，比较单次、并行和链式模式在同一任务上的表现。

bash
curl -X POST http://localhost:9999/benchmark \
-d {task:分析X,data:...,depth:standard}

在6个FLASK维度上评分：准确性（2倍权重）、深度（1.5倍）、完整性、连贯性、可操作性（1.5倍）、细微差别。

能力发现 (v1.3)

让编排器发现可用的执行模式： bash swarm capabilities

或

curl http://localhost:9999/capabilities

提示缓存 (v1.3.2)

LLM响应的LRU缓存。缓存命中时加速212倍（并行模式），链式模式加速514倍。

- 键值由指令+输入+视角的哈希值决定
最多500条，TTL为1小时
跳过网络搜索任务（需要最新数据）
守护进程重启后持久化到磁盘
按任务绕过：设置 task.cache = false

bash

查看缓存统计

curl http://localhost:9999/cache

清除缓存

curl -X DELETE http://localhost:9999/cache

缓存统计显示在 swarm status 中。

阶段重试 (v1.3.2)

如果链式阶段内的任务失败，仅重试失败的任务（而非整个阶段）。默认：1次重试。可通过 phase.retries 按阶段配置，或通过 options.stageRetries 全局配置。

成本追踪 (v1.3.1)

所有端点在其 complete 事件中返回成本数据：

- session — 当前守护进程会话总计
daily — 跨重启持久化，全天累积

bash
swarm status # 显示会话+每日成本
swarm savings # 月度节省报告

网络搜索 (v1.1)

工作节点通过Google搜索基础搜索实时网络（仅限Gemini，无额外成本）。

bash

研究默认使用网络搜索

swarm research 主题 --topic 角度

并行模式带网络搜索

curl -X POST http://localhost:9999/parallel \ -d {prompts:[X的当前价格？],options:{webSearch:true}}

JavaScript API

javascript
const { parallel, research } = require(~/clawd/skills/node-scaling/lib);
const { SwarmClient } = require(~/clawd/skills/node-scaling/lib/client);

// 简单并行
const result = await parallel([提示1, 提示2, 提示3]);

// 带流式传输的客户端
const client = new SwarmClient();
for await (const event of client.parallel(prompts)) { ... }
for await (const event of client.research(subjects, topic)) { ... }

// 链式
const result = await client.chainSync({ task, data, depth });

守护进程管理

bash
swarm start # 启动守护进程（后台）
swarm stop # 停止守护进程
swarm status # 状态、成本、缓存统计
swarm restart # 重启守护进程
swarm savings # 月度节省报告
swarm logs [N] # 守护进程日志的最后N行

性能 (v1.3.2)

模式	任务数	时间	备注
并行（简单）	5	~700ms	每任务有效142ms
并行（压力）

10 | ~1.2s | 每任务有效123ms | | 链式（标准） | 5 | ~14s | 3阶段多视角 | | 链式（快速） | 2 | ~3s | 2阶段提取+综合 | | 缓存命中 | 任意 | ~3-5ms | 加速200-500倍 | | 研究（网络） | 2 | ~15s | Google基础延迟 |

配置

位置：~/.config/clawdbot/node-scaling.yaml

yaml
node_scaling:
enabled: true
limits:
max_nodes: 16
maxconcurrentapi: 16
provider:
name: gemini
model: gemini-2.0-flash
web_search:
enabled: true
parallel_default: false
cost:
maxdailyspend: 10.00

故障排除

问题	解决方法
守护进程未运行	swarm start
无API密钥

结构化输出 (v1.3.7)

强制JSON输出并带模式验证——结构化任务零解析失败。

bash

使用内置模式

curl -X POST http://localhost:9999/structured \
-d {prompt:从以下内容提取实体：蒂姆·库克发布了iPhone 17,schema:entities}

使用自定义模式

curl -X POST http://localhost:9999/structured \ -d {prompt:分类此文本,data:...,schema:{type:object,properties:{category:{type:string}}}}

JSON模式（无模式，仅强制JSON）

curl -X POST http://localhost:9999/structured \

swarm群蜂策略

swarm

Swarm — Cut Your LLM Costs by 200x

At a Glance

When to Use

Quick Reference

Execution Modes

Parallel (v1.0)

Research (v1.1)

Chain (v1.3) — Refinement Pipelines

Benchmark (v1.3)

Capabilities Discovery (v1.3)

Prompt Cache (v1.3.2)

Stage Retry (v1.3.2)

Cost Tracking (v1.3.1)

Web Search (v1.1)

JavaScript API

Daemon Management

Performance (v1.3.2)

Config

Troubleshooting

Structured Output (v1.3.7)

Majority Voting (v1.3.7)

Self-Reflection (v1.3.5)

Skeleton-of-Thought (v1.3.6)

API Endpoints

Cost Comparison

Swarm — 将您的LLM成本降低200倍

概览

使用场景

快速参考

检查守护进程（每次会话执行）

如果未运行则启动

并行提示

研究多个主题

发现能力

执行模式

并行模式 (v1.0)

研究模式 (v1.1)

链式模式 (v1.3) — 优化流水线

或

基准测试 (v1.3)

能力发现 (v1.3)

或

提示缓存 (v1.3.2)

查看缓存统计

清除缓存

阶段重试 (v1.3.2)

成本追踪 (v1.3.1)

网络搜索 (v1.1)

研究默认使用网络搜索

并行模式带网络搜索

JavaScript API

守护进程管理

性能 (v1.3.2)

配置

故障排除

结构化输出 (v1.3.7)

使用内置模式

使用自定义模式

JSON模式（无模式，仅强制JSON）

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement