When to Use
User wants to run local models with LM Studio, connect an app to its local server, or debug weak local inference behavior.
Use this for server readiness, model loading, OpenAI-compatible API integration, embeddings, MCP setup, and local-first operating decisions.
Architecture
Memory lives in ~/lm-studio/. If ~/lm-studio/ does not exist, run setup.md. See memory-template.md for structure.
CODEBLOCK0
Quick Reference
| Topic | File |
|---|
| Setup behavior and activation boundaries | INLINECODE4 |
| Memory schema and status states |
memory-template.md |
| Server startup and smoke tests |
server-workflows.md |
| Download, load, unload, and swap models |
model-lifecycle.md |
| OpenAI-compatible request patterns |
api-recipes.md |
| MCP connection patterns and guardrails |
mcp-playbooks.md |
| Symptom-based debugging |
troubleshooting.md |
Requirements
- - LM Studio or
llmster is already installed on the machine. - INLINECODE12 and
jq are available for smoke tests and response inspection. - INLINECODE14 is optional but preferred for repeatable server and model operations.
- Keep requests local by default; only add remote MCP servers or network exposure when the user explicitly asks.
Core Rules
1. Prove the server is reachable before changing client code
- - Use
server-workflows.md to confirm the actual port, endpoint reachability, and model visibility. - "LM Studio is open" is not enough. Require one real request to succeed before touching integration code.
2. Separate downloaded, listed, loaded, and active models
- - Use
model-lifecycle.md for discovery, loading, unloading, and verification. - Never assume a downloaded filename, API model id, CLI identifier, and active runtime instance are the same thing.
3. Prefer OpenAI-compatible endpoints for app integration
- - Start from
api-recipes.md and change only the base URL and model identifier before rewriting an existing client. - Verify each workload separately:
responses, chat/completions, embeddings, or completions.
4. Match model size and context to machine limits
- - Treat slow first token, OOM, and context overflow as runtime-fit problems first, not prompt problems first.
- Reduce model size, quantization burden, or context length before escalating complexity.
5. Validate after every runtime change
- - After loading a new model, changing context length, or altering server settings, run one end-to-end smoke test.
- Record the known-good combination in memory so the agent can reuse it instead of rediscovering it.
6. Treat MCP as a separate risk layer
- - Use
mcp-playbooks.md to connect servers, but debug model serving and MCP behavior independently. - Never install untrusted MCP servers or silently route local data to remote endpoints.
7. Escalate beyond local when the task exceeds the local setup
- - LM Studio is strong for privacy-sensitive work, offline execution, extraction, and controlled agent loops.
- For unsupported capabilities or repeated quality failures, say so explicitly and recommend a stronger remote path.
Common Traps
- - Assuming port
1234 without checking reachability -> integrations fail even though the app looks healthy. - Treating
GET /v1/models as proof a model is ready -> Just-In-Time listings can appear before a usable runtime is confirmed. - Reusing cloud model names in local requests -> the client is fine, but the local model identifier is wrong.
- Forcing JSON, tools, or vision on an unverified local model -> failures get blamed on prompts instead of capability mismatch.
- Leaving large models loaded while debugging another issue -> RAM or VRAM pressure hides the real cause.
- Installing random MCP servers -> privacy and system access boundaries disappear quickly.
Security & Privacy
Data that leaves your machine:
- - None by default for local
localhost server calls. - Optional model downloads or MCP servers follow the user's explicit configuration, not this skill's default path.
Data that stays local:
- - Prompt content sent to the LM Studio server running on the same machine.
- Notes stored in
~/lm-studio/ if the user wants persistent context.
This skill does NOT:
- - Assume remote access is safe by default.
- Store secrets in skill memory files.
- Install MCP servers or open network access without explicit user intent.
Related Skills
Install with
clawhub install <slug> if user confirms:
- -
models — Choose models by workload, context budget, and quality tradeoffs. - INLINECODE29 — Shape request payloads, retries, parsing, and integration debugging.
- INLINECODE30 — Operate local infrastructure with practical reliability and security habits.
- INLINECODE31 — Escalate from local-first execution to routed cloud models when capability gaps matter.
- INLINECODE32 — Package helper services or MCP servers consistently on the local machine.
Feedback
- - If useful: INLINECODE33
- Stay updated: INLINECODE34
何时使用
用户希望使用 LM Studio 运行本地模型、将应用连接到其本地服务器,或调试弱本地推理行为。
用于服务器就绪性检查、模型加载、OpenAI 兼容 API 集成、嵌入向量、MCP 设置以及本地优先的操作决策。
架构
内存文件位于 ~/lm-studio/。如果 ~/lm-studio/ 不存在,请运行 setup.md。结构请参见 memory-template.md。
text
~/lm-studio/
├── memory.md # 激活、首选端口、已知良好默认值
├── server-notes.md # 可达性检查和服务器模式说明
├── model-profiles.md # 按工作负载验证的模型
└── incidents.md # 重复失败和已确认修复
快速参考
| 主题 | 文件 |
|---|
| 设置行为和激活边界 | setup.md |
| 内存模式和状态状态 |
memory-template.md |
| 服务器启动和冒烟测试 | server-workflows.md |
| 下载、加载、卸载和切换模型 | model-lifecycle.md |
| OpenAI 兼容请求模式 | api-recipes.md |
| MCP 连接模式和防护措施 | mcp-playbooks.md |
| 基于症状的调试 | troubleshooting.md |
要求
- - 机器上已安装 LM Studio 或 llmster。
- curl 和 jq 可用于冒烟测试和响应检查。
- lms 可选但推荐用于可重复的服务器和模型操作。
- 默认保持请求本地化;仅在用户明确要求时添加远程 MCP 服务器或网络暴露。
核心规则
1. 在更改客户端代码前,先证明服务器可达
- - 使用 server-workflows.md 确认实际端口、端点可达性和模型可见性。
- LM Studio 已打开 不够。在触及集成代码前,需要至少一次真实请求成功。
2. 区分已下载、已列出、已加载和活跃的模型
- - 使用 model-lifecycle.md 进行发现、加载、卸载和验证。
- 切勿假设下载的文件名、API 模型 ID、CLI 标识符和活跃运行时实例是同一回事。
3. 应用集成优先使用 OpenAI 兼容端点
- - 从 api-recipes.md 开始,在重写现有客户端前仅更改基础 URL 和模型标识符。
- 分别验证每个工作负载:responses、chat/completions、embeddings 或 completions。
4. 将模型大小和上下文与机器限制匹配
- - 将首个 token 慢、OOM 和上下文溢出视为运行时适配问题,而非提示词问题。
- 在升级复杂度之前,先减小模型大小、量化负担或上下文长度。
5. 每次运行时更改后都要验证
- - 加载新模型、更改上下文长度或修改服务器设置后,运行一次端到端冒烟测试。
- 将已知良好的组合记录在内存中,以便代理可以重复使用而非重新发现。
6. 将 MCP 视为独立的风险层
- - 使用 mcp-playbooks.md 连接服务器,但独立调试模型服务和 MCP 行为。
- 切勿安装不受信任的 MCP 服务器或静默将本地数据路由到远程端点。
7. 当任务超出本地设置时升级到远程
- - LM Studio 适用于隐私敏感工作、离线执行、数据提取和受控代理循环。
- 对于不支持的能力或重复的质量失败,明确说明并推荐更强的远程路径。
常见陷阱
- - 假设端口为 1234 而不检查可达性 -> 即使应用看起来正常,集成也会失败。
- 将 GET /v1/models 视为模型已就绪的证据 -> 即时列表可能在确认可用运行时之前出现。
- 在本地请求中复用云端模型名称 -> 客户端没问题,但本地模型标识符错误。
- 在未经验证的本地模型上强制使用 JSON、工具或视觉功能 -> 失败被归咎于提示词而非能力不匹配。
- 在调试其他问题时保留大模型加载 -> RAM 或 VRAM 压力掩盖了真正原因。
- 安装随机的 MCP 服务器 -> 隐私和系统访问边界迅速消失。
安全与隐私
离开你机器的数据:
- - 默认情况下,本地 localhost 服务器调用无数据离开。
- 可选的模型下载或 MCP 服务器遵循用户的显式配置,而非此技能的默认路径。
留在本地的数据:
- - 发送到同一机器上运行的 LM Studio 服务器的提示词内容。
- 如果用户希望持久化上下文,则存储在 ~/lm-studio/ 中的笔记。
此技能不会:
- - 默认认为远程访问是安全的。
- 在技能内存文件中存储密钥。
- 在没有用户明确意图的情况下安装 MCP 服务器或开放网络访问。
相关技能
如果用户确认,使用 clawhub install
安装:
- - models — 根据工作负载、上下文预算和质量权衡选择模型。
- api — 塑造请求负载、重试、解析和集成调试。
- self-host — 以实用的可靠性和安全习惯操作本地基础设施。
- open-router — 当能力差距重要时,从本地优先执行升级到路由云端模型。
- docker — 在本地机器上一致地打包辅助服务或 MCP 服务器。
反馈
- - 如果有用:clawhub star lm-studio
- 保持更新:clawhub sync