IdleClaw
A distributed inference network for Ollama. Contributors share idle GPU/CPU capacity, consumers use community compute when their API credits run out.
Modes
Contribute — Share your idle inference
Start your machine as an inference node. Your local Ollama models become available to the community.
CODEBLOCK0
This connects to the IdleClaw routing server, registers your available models, and begins accepting inference requests. Press Ctrl+C to stop.
Requirements: Ollama must be running with at least one model pulled.
Consume — Use community inference
Send a chat request to the community network instead of running locally.
CODEBLOCK1
Streams the response to stdout as tokens arrive.
Status — Check network health
See how many nodes are online and what models are available.
CODEBLOCK2
Configuration
| Variable | Default | Description |
|---|
| INLINECODE0 | INLINECODE1 | Routing server URL |
| INLINECODE2 |
http://localhost:11434 | Local Ollama endpoint |
Security
External Endpoints
This skill contacts the following external endpoints:
- 1. IdleClaw Routing Server (
IDLECLAW_SERVER, default https://api.idleclaw.com)
-
Contribute mode: Opens a WebSocket connection to register as an inference node. Sends: node ID, available model names, and inference responses. Receives: inference requests (model name, chat messages, and optional tool schemas).
-
Consume mode: Sends HTTP POST to
/api/chat with model name and chat messages. Receives: streaming token response via SSE.
-
Status mode: Sends HTTP GET to
/health and
/api/models. Receives: server health info and available model list.
- 2. Local Ollama (
OLLAMA_HOST, default http://localhost:11434)
-
Contribute mode only: Calls Ollama's API to list models and run inference. All communication stays on localhost.
Data Handling
- - No user data is persisted locally or on the server beyond the active session.
- No credentials or API keys are required or stored.
- All communication is text — every message between the server, the node, and Ollama is JSON text over WebSocket or HTTP. No binary data, file uploads, images, or executable payloads are transmitted.
- No local code execution — the contributor node is a relay. It forwards JSON inference parameters to Ollama and streams JSON responses back to the server. The node does not execute tools, run shell commands, or access the filesystem. Any tool execution is handled server-side after response validation.
- Chat messages (text strings) are transmitted from consumer to server to contributor node for inference, then discarded.
- No telemetry or analytics are collected.
- In contribute mode, the routing server sends JSON inference requests to the node, which forwards them to your local Ollama instance. Ollama returns a JSON text response which the node relays back. Contributors can point
IDLECLAW_SERVER to a self-hosted instance. - In consume mode, text prompts are sent to the routing server which routes them to an available contributor node.
Sanitization
Client-side:
- - Inference parameters are validated before passing to Ollama: only whitelisted keys are forwarded (
model, messages, stream, think, keep_alive, options, tools, format). Unknown keys are stripped. - Requested model must match a model the node registered — requests for unregistered models are rejected.
- Message limits enforced: max 50 messages per request, max 10,000 characters per message content.
- Only known response fields are forwarded back to the server (
role, content, thinking, tool_calls). - In consume mode, model names are validated against a strict pattern (alphanumeric, colons, periods, hyphens only). In contribute mode, requested models must match a model the node registered from Ollama.
- Server URLs are validated as HTTP/HTTPS URLs before use.
- No shell commands are constructed from user input — all execution is Python-only.
- No local files are read or accessed — the skill only communicates with Ollama and the routing server.
Server-side (routing server):
- - IP-based rate limiting on all endpoints: chat (20 RPM), node registration (5 RPM), general (60 RPM).
- Input validation: max 50 messages per request, 10,000 chars per message, 64-char model names, roles restricted to
user and assistant. - Output sanitization: response content is stripped of markup tags before delivery to consumers.
- Node registration limits: max 3 nodes per IP, max concurrent requests clamped to 1-10.
- Tool execution safeguards: schema validation, argument type checking, 15-second timeout, per-node rate limiting (20 calls/min).
- Server binds to localhost only, accessed through Caddy reverse proxy with auto-TLS.
- Red team tested with documented findings and mitigations (security assessment on GitHub).
Installation
Run the installer to set up Python dependencies:
CODEBLOCK3
IdleClaw
一个面向Ollama的分布式推理网络。贡献者分享闲置的GPU/CPU算力,消费者在API额度用尽时使用社区计算资源。
模式
贡献模式 — 分享闲置推理能力
将您的机器作为推理节点启动。您本地的Ollama模型将对社区开放。
bash
cd $SKILL_DIR && python scripts/contribute.py
此操作将连接到IdleClaw路由服务器,注册您可用的模型,并开始接受推理请求。按Ctrl+C停止。
要求: Ollama必须正在运行,且至少已拉取一个模型。
消费模式 — 使用社区推理
向社区网络发送聊天请求,而非在本地运行。
bash
cd $SKILL_DIR && python scripts/consume.py --model <模型名称> --prompt <您的消息>
响应将以流式方式逐token输出到标准输出。
状态模式 — 检查网络健康状态
查看在线节点数量及可用模型。
bash
cd $SKILL_DIR && python scripts/status.py
配置
| 变量 | 默认值 | 描述 |
|---|
| IDLECLAWSERVER | https://api.idleclaw.com | 路由服务器URL |
| OLLAMAHOST |
http://localhost:11434 | 本地Ollama端点 |
安全性
外部端点
本技能会连接以下外部端点:
- 1. IdleClaw路由服务器(IDLECLAW_SERVER,默认https://api.idleclaw.com)
-
贡献模式:建立WebSocket连接以注册为推理节点。发送:节点ID、可用模型名称和推理响应。接收:推理请求(模型名称、聊天消息和可选工具架构)。
-
消费模式:向/api/chat发送HTTP POST请求,包含模型名称和聊天消息。接收:通过SSE流式传输的token响应。
-
状态模式:向/health和/api/models发送HTTP GET请求。接收:服务器健康信息和可用模型列表。
- 2. 本地Ollama(OLLAMA_HOST,默认http://localhost:11434)
-
仅贡献模式:调用Ollama的API列出模型并运行推理。所有通信保持在本地主机。
数据处理
- - 无用户数据持久化——除活动会话外,本地或服务器均不保存任何数据。
- 无需凭据或API密钥——不要求也不存储任何凭据。
- 所有通信均为文本——服务器、节点和Ollama之间的每条消息均为通过WebSocket或HTTP传输的JSON文本。不传输二进制数据、文件上传、图像或可执行负载。
- 无本地代码执行——贡献者节点作为中继。它将JSON推理参数转发给Ollama,并将JSON响应流式传回服务器。节点不执行工具、运行shell命令或访问文件系统。任何工具执行均在响应验证后由服务器端处理。
- 聊天消息(文本字符串)从消费者传输到服务器,再到贡献者节点进行推理,然后被丢弃。
- 不收集遥测或分析数据。
- 在贡献模式下,路由服务器向节点发送JSON推理请求,节点将其转发到本地Ollama实例。Ollama返回JSON文本响应,节点将其中继回去。贡献者可以将IDLECLAW_SERVER指向自托管实例。
- 在消费模式下,文本提示被发送到路由服务器,由路由服务器将其路由到可用的贡献者节点。
净化处理
客户端:
- - 推理参数在传递给Ollama前经过验证:仅转发白名单中的键(model、messages、stream、think、keepalive、options、tools、format)。未知键被剔除。
- 请求的模型必须与节点注册的模型匹配——未注册模型的请求将被拒绝。
- 消息限制:每个请求最多50条消息,每条消息内容最多10,000个字符。
- 仅将已知的响应字段转发回服务器(role、content、thinking、toolcalls)。
- 在消费模式下,模型名称需通过严格模式验证(仅限字母数字、冒号、句点、连字符)。在贡献模式下,请求的模型必须与节点从Ollama注册的模型匹配。
- 服务器URL在使用前需验证为HTTP/HTTPS URL。
- 不根据用户输入构造shell命令——所有执行均仅使用Python。
- 不读取或访问本地文件——本技能仅与Ollama和路由服务器通信。
服务器端(路由服务器):
- - 所有端点均实施基于IP的速率限制:聊天(20 RPM)、节点注册(5 RPM)、通用(60 RPM)。
- 输入验证:每个请求最多50条消息,每条消息10,000个字符,模型名称64个字符,角色限制为user和assistant。
- 输出净化:响应内容在传递给消费者前会去除标记标签。
- 节点注册限制:每个IP最多3个节点,并发请求限制在1-10之间。
- 工具执行保障:架构验证、参数类型检查、15秒超时、每节点速率限制(20次调用/分钟)。
- 服务器仅绑定到本地主机,通过带自动TLS的Caddy反向代理访问。
- 经过红队测试,并有记录在案的发现和缓解措施(GitHub上的安全评估)。
安装
运行安装程序以设置Python依赖:
bash
cd $SKILL_DIR && bash install.sh