When to Use
User needs Qwen to work reliably for chat, coding, reasoning, structured outputs, or vision. Agent handles surface selection, live model verification, hosted-versus-local tradeoffs, and failure recovery before the workflow reaches production.
Architecture
Memory lives in ~/qwen/. If ~/qwen/ does not exist, run setup.md. See memory-template.md for structure.
CODEBLOCK0
Quick Reference
Use the smallest file that resolves the blocker.
| Topic | File |
|---|
| Setup process | INLINECODE4 |
| Memory template |
memory-template.md |
| Hosted and local request patterns |
api-patterns.md |
| Workload routing matrix |
routing-matrix.md |
| Hosted versus self-hosted decisions |
deployment-paths.md |
| Tool-calling and structured output guardrails |
tool-calling.md |
| Debugging and recovery |
troubleshooting.md |
Requirements
- -
curl and jq for minimal endpoint checks - Hosted Qwen usually needs a INLINECODE13
- Self-hosted Qwen may use Ollama, vLLM, SGLang, or another OpenAI-compatible server
- Keep secrets in environment variables only
Core Rules
1. Lock the Surface Before Tuning the Model
- - Identify the real execution surface first: Alibaba Model Studio hosted API, another OpenAI-compatible provider, or a self-hosted server.
- Most "Qwen issues" are actually endpoint, region, server, or chat-template issues rather than model quality issues.
2. Verify Live Availability Before Naming Any Model
- - Start with a
/models or equivalent health check and copy the live model ID from the response. - Never trust stale screenshots, old blog posts, or remembered IDs for production routing.
3. Route by Workload, Not by Brand Loyalty
- - Split the request into one of these paths: fast chat, deep reasoning, coding agent, deterministic JSON, or vision.
- Pick the smallest Qwen family and server path that can reliably do that job.
4. Treat Structured Output as a Separate Reliability Problem
- - If Qwen is feeding tools, JSON, or downstream writes, use strict schemas, low temperature, and parser validation before acting.
- If the first pass is creative or reasoning-heavy, add a second deterministic normalization pass instead of forcing one prompt to do both.
5. Separate Model Problems From Server Problems
- - When behavior changes after migration, isolate the variable: model family, quantization, chat template, reasoning mode, parser, or backend.
- Reproduce with one minimal payload before changing prompts, infrastructure, and business logic at the same time.
6. Compare Hosted and Self-Hosted Explicitly
- - Hosted Qwen usually wins on speed to first success and managed multimodal access.
- Self-hosted Qwen only wins when privacy, local cost control, or offline use clearly outweigh operational overhead.
7. Ask Before Creating Persistent State
- - Work statelessly by default.
- Only create
~/qwen/ notes, saved routes, or repro logs after the user wants continuity across Qwen tasks.
Common Traps
- - Treating "Qwen" as one interchangeable thing -> hosted APIs, Ollama, vLLM, and agent frameworks behave differently.
- Hardcoding dated model IDs -> region and release cadence make old IDs fail fast.
- Mixing free-form reasoning with strict JSON output -> parsing breaks when one prompt is asked to do both.
- Blaming the model for local slowness -> Apple Silicon and Ollama often fail because of model size, quantization, or oversized context.
- Migrating from another OpenAI-compatible backend without rechecking tool-calling -> parser and chat-template differences can break automation.
External Endpoints
Use only the smallest hosted endpoint that answers the current question.
| Endpoint | Data Sent | Purpose |
|---|
| https://dashscope.aliyuncs.com/compatible-mode/v1/models | Auth header only | Mainland China model discovery |
| https://dashscope-intl.aliyuncs.com/compatible-mode/v1/models |
Auth header only | International model discovery |
| https://dashscope-us.aliyuncs.com/compatible-mode/v1/models | Auth header only | United States model discovery |
| https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions | Prompt messages and options | Hosted Qwen chat completions in Beijing region |
| https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions | Prompt messages and options | Hosted Qwen chat completions in Singapore region |
| https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions | Prompt messages and options | Hosted Qwen chat completions in Virginia region |
No other data is sent externally.
Security & Privacy
Data that leaves your machine:
- - Prompt content sent to Alibaba Cloud Model Studio when using hosted Qwen
- Optional images or multimodal payloads sent to hosted Qwen vision endpoints when requested
Data that stays local:
- - Deployment preferences and routing notes in
~/qwen/ after user approval - Local server URLs, workload notes, and sanitized repro payloads kept for debugging
This skill does NOT:
- - Store API keys in markdown files
- Send data to undeclared third-party endpoints
- Assume local servers are safe to expose publicly
- Modify its own skill files
Scope
This skill ONLY:
- - routes Qwen work across hosted and self-hosted execution surfaces
- chooses model families for chat, coding, reasoning, vision, and automation
- debugs migration, parser, latency, and endpoint problems
- stores lightweight local notes only after user approval
This skill NEVER:
- - invent live model availability without checking
- persist secrets in INLINECODE17
- execute destructive downstream automation without validated output
- pretend one backend's tool-calling behavior applies everywhere
Trust
Using hosted Qwen sends prompt data to Alibaba Cloud Model Studio.
Only install if you trust that service with your data, or keep Qwen fully self-hosted.
Related Skills
Install with
clawhub install <slug> if user confirms:
- -
models — choose model families and cost tiers before locking Qwen into production - INLINECODE20 — debug auth, payloads, retries, and OpenAI-compatible request shapes
- INLINECODE21 — tighten agent coding workflows after the Qwen route itself is stable
- INLINECODE22 — improve conversation shaping once the Qwen route itself is stable
- INLINECODE23 — store durable routing choices and repeated migration lessons
Feedback
- - If useful: INLINECODE24
- Stay updated: INLINECODE25
何时使用
用户需要Qwen在聊天、编程、推理、结构化输出或视觉任务中可靠工作。在工作流进入生产环境前,Agent负责处理表面选择、实时模型验证、托管与本地权衡以及故障恢复。
架构
内存存储在~/qwen/目录中。如果~/qwen/不存在,请运行setup.md。结构参见memory-template.md。
text
~/qwen/
├── memory.md # 状态、激活规则和部署默认设置
├── routes.md # 每种工作负载的首选路由
├── servers.md # 已知的本地或托管端点
├── experiments.md # 提示词、解析器和延迟记录
└── logs/ # 可选的可复现问题清理数据
快速参考
使用能解决阻塞问题的最小文件。
memory-template.md |
| 托管和本地请求模式 | api-patterns.md |
| 工作负载路由矩阵 | routing-matrix.md |
| 托管与自托管决策 | deployment-paths.md |
| 工具调用和结构化输出防护 | tool-calling.md |
| 调试和恢复 | troubleshooting.md |
要求
- - curl和jq用于最小端点检查
- 托管Qwen通常需要DASHSCOPEAPIKEY
- 自托管Qwen可使用Ollama、vLLM、SGLang或其他兼容OpenAI的服务器
- 密钥仅保存在环境变量中
核心规则
1. 先锁定执行表面,再调整模型
- - 首先识别真实的执行表面:阿里模型工作室托管API、其他兼容OpenAI的提供商或自托管服务器。
- 大多数Qwen问题实际上是端点、区域、服务器或聊天模板问题,而非模型质量问题。
2. 在指定任何模型前先验证实时可用性
- - 从/models或等效健康检查开始,从响应中复制实时模型ID。
- 切勿依赖过时的截图、旧博客文章或记忆中的ID进行生产路由。
3. 按工作负载路由,而非品牌偏好
- - 将请求拆分为以下路径之一:快速聊天、深度推理、编程代理、确定性JSON或视觉任务。
- 选择能可靠完成该任务的最小Qwen系列和服务器路径。
4. 将结构化输出视为独立的可靠性问题
- - 如果Qwen用于工具调用、JSON生成或下游写入,在操作前使用严格模式、低温度和解析器验证。
- 如果第一轮是创意性或推理密集型任务,添加第二轮确定性标准化处理,而非强制一个提示词同时完成两项任务。
5. 区分模型问题与服务器问题
- - 迁移后行为发生变化时,隔离变量:模型系列、量化、聊天模板、推理模式、解析器或后端。
- 在同时更改提示词、基础设施和业务逻辑之前,先用一个最小负载进行复现。
6. 明确比较托管与自托管
- - 托管Qwen通常在首次成功速度和托管多模态访问方面占优。
- 自托管Qwen仅在隐私、本地成本控制或离线使用的优势明显超过运维开销时胜出。
7. 创建持久状态前先询问
- - 默认以无状态方式工作。
- 仅在用户希望跨Qwen任务保持连续性后,才创建~/qwen/笔记、保存的路由或可复现日志。
常见陷阱
- - 将Qwen视为可互换的单一实体 -> 托管API、Ollama、vLLM和代理框架的行为各不相同。
- 硬编码过时的模型ID -> 区域和发布节奏会使旧ID快速失效。
- 将自由形式推理与严格JSON输出混合 -> 当一个提示词被要求同时完成两项任务时,解析会失败。
- 将本地速度慢归咎于模型 -> Apple Silicon和Ollama常因模型大小、量化或上下文过大而失败。
- 从不重新检查工具调用的兼容OpenAI后端迁移 -> 解析器和聊天模板差异可能破坏自动化。
外部端点
仅使用能回答当前问题的最小托管端点。
| 端点 | 发送数据 | 用途 |
|---|
| https://dashscope.aliyuncs.com/compatible-mode/v1/models | 仅认证头 | 中国大陆模型发现 |
| https://dashscope-intl.aliyuncs.com/compatible-mode/v1/models |
仅认证头 | 国际模型发现 |
| https://dashscope-us.aliyuncs.com/compatible-mode/v1/models | 仅认证头 | 美国模型发现 |
| https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions | 提示消息和选项 | 北京区域托管Qwen聊天补全 |
| https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions | 提示消息和选项 | 新加坡区域托管Qwen聊天补全 |
| https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions | 提示消息和选项 | 弗吉尼亚区域托管Qwen聊天补全 |
不向外部发送其他数据。
安全与隐私
离开您机器的数据:
- - 使用托管Qwen时发送给阿里云模型工作室的提示内容
- 请求时发送给托管Qwen视觉端点的可选图像或多模态负载
保留在本地数据:
- - 用户批准后存储在~/qwen/中的部署偏好和路由笔记
- 用于调试的本地服务器URL、工作负载笔记和清理后的可复现负载
此技能不会:
- - 在markdown文件中存储API密钥
- 向未声明的第三方端点发送数据
- 假设本地服务器可安全公开暴露
- 修改自身的技能文件
范围
此技能仅:
- - 在托管和自托管执行表面之间路由Qwen工作
- 为聊天、编程、推理、视觉和自动化选择模型系列
- 调试迁移、解析器、延迟和端点问题
- 仅在用户批准后存储轻量级本地笔记
此技能绝不:
- - 未经检查就虚构模型实时可用性
- 在~/qwen/中持久化密钥
- 未经验证输出就执行破坏性下游自动化
- 假设一个后端的工具调用行为适用于所有场景
信任
使用托管Qwen会将提示数据发送给阿里云模型工作室。
仅当您信任该服务处理您的数据时才安装,或保持Qwen完全自托管。
相关技能
如果用户确认,使用clawhub install
安装:
- - models — 在将Qwen锁定到生产环境前选择模型系列和成本层级
- api — 调试认证、负载、重试和兼容OpenAI的请求格式
- coding — 在Qwen路由本身稳定后收紧代理编程工作流
- chat — 在Qwen路由本身稳定后改进对话塑造
- memory — 存储持久的路由选择和重复迁移经验
反馈
- - 如果有用:clawhub star qwen
- 保持更新:clawhub sync