Qwen

When to Use

User needs Qwen to work reliably for chat, coding, reasoning, structured outputs, or vision. Agent handles surface selection, live model verification, hosted-versus-local tradeoffs, and failure recovery before the workflow reaches production.

Architecture

Memory lives in ~/qwen/. If ~/qwen/ does not exist, run setup.md. See memory-template.md for structure.

CODEBLOCK0

Quick Reference

Use the smallest file that resolves the blocker.

Topic	File
Setup process	INLINECODE4
Memory template

Requirements

- curl and jq for minimal endpoint checks
Hosted Qwen usually needs a INLINECODE13
Self-hosted Qwen may use Ollama, vLLM, SGLang, or another OpenAI-compatible server
Keep secrets in environment variables only

Core Rules

1. Lock the Surface Before Tuning the Model

- Identify the real execution surface first: Alibaba Model Studio hosted API, another OpenAI-compatible provider, or a self-hosted server.
Most "Qwen issues" are actually endpoint, region, server, or chat-template issues rather than model quality issues.

2. Verify Live Availability Before Naming Any Model

- Start with a /models or equivalent health check and copy the live model ID from the response.
Never trust stale screenshots, old blog posts, or remembered IDs for production routing.

3. Route by Workload, Not by Brand Loyalty

- Split the request into one of these paths: fast chat, deep reasoning, coding agent, deterministic JSON, or vision.
Pick the smallest Qwen family and server path that can reliably do that job.

4. Treat Structured Output as a Separate Reliability Problem

- If Qwen is feeding tools, JSON, or downstream writes, use strict schemas, low temperature, and parser validation before acting.
If the first pass is creative or reasoning-heavy, add a second deterministic normalization pass instead of forcing one prompt to do both.

5. Separate Model Problems From Server Problems

- When behavior changes after migration, isolate the variable: model family, quantization, chat template, reasoning mode, parser, or backend.
Reproduce with one minimal payload before changing prompts, infrastructure, and business logic at the same time.

6. Compare Hosted and Self-Hosted Explicitly

- Hosted Qwen usually wins on speed to first success and managed multimodal access.
Self-hosted Qwen only wins when privacy, local cost control, or offline use clearly outweigh operational overhead.

7. Ask Before Creating Persistent State

- Work statelessly by default.
Only create ~/qwen/ notes, saved routes, or repro logs after the user wants continuity across Qwen tasks.

Common Traps

- Treating "Qwen" as one interchangeable thing -> hosted APIs, Ollama, vLLM, and agent frameworks behave differently.
Hardcoding dated model IDs -> region and release cadence make old IDs fail fast.
Mixing free-form reasoning with strict JSON output -> parsing breaks when one prompt is asked to do both.
Blaming the model for local slowness -> Apple Silicon and Ollama often fail because of model size, quantization, or oversized context.
Migrating from another OpenAI-compatible backend without rechecking tool-calling -> parser and chat-template differences can break automation.

External Endpoints

Use only the smallest hosted endpoint that answers the current question.

Endpoint	Data Sent	Purpose
https://dashscope.aliyuncs.com/compatible-mode/v1/models	Auth header only	Mainland China model discovery
https://dashscope-intl.aliyuncs.com/compatible-mode/v1/models

Auth header only | International model discovery |
| https://dashscope-us.aliyuncs.com/compatible-mode/v1/models | Auth header only | United States model discovery |
| https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions | Prompt messages and options | Hosted Qwen chat completions in Beijing region |
| https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions | Prompt messages and options | Hosted Qwen chat completions in Singapore region |
| https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions | Prompt messages and options | Hosted Qwen chat completions in Virginia region |

No other data is sent externally.

Security & Privacy

Data that leaves your machine:

- Prompt content sent to Alibaba Cloud Model Studio when using hosted Qwen
Optional images or multimodal payloads sent to hosted Qwen vision endpoints when requested

Data that stays local:

- Deployment preferences and routing notes in ~/qwen/ after user approval
Local server URLs, workload notes, and sanitized repro payloads kept for debugging

This skill does NOT:

- Store API keys in markdown files
Send data to undeclared third-party endpoints
Assume local servers are safe to expose publicly
Modify its own skill files

Scope

This skill ONLY:

- routes Qwen work across hosted and self-hosted execution surfaces
chooses model families for chat, coding, reasoning, vision, and automation
debugs migration, parser, latency, and endpoint problems
stores lightweight local notes only after user approval

This skill NEVER:

- invent live model availability without checking
persist secrets in INLINECODE17
execute destructive downstream automation without validated output
pretend one backend's tool-calling behavior applies everywhere

Trust

Using hosted Qwen sends prompt data to Alibaba Cloud Model Studio.
Only install if you trust that service with your data, or keep Qwen fully self-hosted.

Related Skills

Install with clawhub install <slug> if user confirms:

- models — choose model families and cost tiers before locking Qwen into production
INLINECODE20 — debug auth, payloads, retries, and OpenAI-compatible request shapes
INLINECODE21 — tighten agent coding workflows after the Qwen route itself is stable
INLINECODE22 — improve conversation shaping once the Qwen route itself is stable
INLINECODE23 — store durable routing choices and repeated migration lessons

Feedback

- If useful: INLINECODE24
Stay updated: INLINECODE25

何时使用

用户需要Qwen在聊天、编程、推理、结构化输出或视觉任务中可靠工作。在工作流进入生产环境前，Agent负责处理表面选择、实时模型验证、托管与本地权衡以及故障恢复。

架构

内存存储在~/qwen/目录中。如果~/qwen/不存在，请运行setup.md。结构参见memory-template.md。

text
~/qwen/
├── memory.md # 状态、激活规则和部署默认设置
├── routes.md # 每种工作负载的首选路由
├── servers.md # 已知的本地或托管端点
├── experiments.md # 提示词、解析器和延迟记录
└── logs/ # 可选的可复现问题清理数据

快速参考

使用能解决阻塞问题的最小文件。

主题	文件
设置流程	setup.md
内存模板

要求

- curl和jq用于最小端点检查
托管Qwen通常需要DASHSCOPEAPIKEY
自托管Qwen可使用Ollama、vLLM、SGLang或其他兼容OpenAI的服务器
密钥仅保存在环境变量中

核心规则

1. 先锁定执行表面，再调整模型

- 首先识别真实的执行表面：阿里模型工作室托管API、其他兼容OpenAI的提供商或自托管服务器。
大多数Qwen问题实际上是端点、区域、服务器或聊天模板问题，而非模型质量问题。

2. 在指定任何模型前先验证实时可用性

- 从/models或等效健康检查开始，从响应中复制实时模型ID。
切勿依赖过时的截图、旧博客文章或记忆中的ID进行生产路由。

3. 按工作负载路由，而非品牌偏好

- 将请求拆分为以下路径之一：快速聊天、深度推理、编程代理、确定性JSON或视觉任务。
选择能可靠完成该任务的最小Qwen系列和服务器路径。

4. 将结构化输出视为独立的可靠性问题

- 如果Qwen用于工具调用、JSON生成或下游写入，在操作前使用严格模式、低温度和解析器验证。
如果第一轮是创意性或推理密集型任务，添加第二轮确定性标准化处理，而非强制一个提示词同时完成两项任务。

5. 区分模型问题与服务器问题

- 迁移后行为发生变化时，隔离变量：模型系列、量化、聊天模板、推理模式、解析器或后端。
在同时更改提示词、基础设施和业务逻辑之前，先用一个最小负载进行复现。

6. 明确比较托管与自托管

- 托管Qwen通常在首次成功速度和托管多模态访问方面占优。
自托管Qwen仅在隐私、本地成本控制或离线使用的优势明显超过运维开销时胜出。

7. 创建持久状态前先询问

- 默认以无状态方式工作。
仅在用户希望跨Qwen任务保持连续性后，才创建~/qwen/笔记、保存的路由或可复现日志。

常见陷阱

- 将Qwen视为可互换的单一实体 -> 托管API、Ollama、vLLM和代理框架的行为各不相同。
硬编码过时的模型ID -> 区域和发布节奏会使旧ID快速失效。
将自由形式推理与严格JSON输出混合 -> 当一个提示词被要求同时完成两项任务时，解析会失败。
将本地速度慢归咎于模型 -> Apple Silicon和Ollama常因模型大小、量化或上下文过大而失败。
从不重新检查工具调用的兼容OpenAI后端迁移 -> 解析器和聊天模板差异可能破坏自动化。

外部端点

仅使用能回答当前问题的最小托管端点。

端点	发送数据	用途
https://dashscope.aliyuncs.com/compatible-mode/v1/models	仅认证头	中国大陆模型发现
https://dashscope-intl.aliyuncs.com/compatible-mode/v1/models

不向外部发送其他数据。

安全与隐私

离开您机器的数据：

- 使用托管Qwen时发送给阿里云模型工作室的提示内容
请求时发送给托管Qwen视觉端点的可选图像或多模态负载

保留在本地数据：

- 用户批准后存储在~/qwen/中的部署偏好和路由笔记
用于调试的本地服务器URL、工作负载笔记和清理后的可复现负载

此技能不会：

- 在markdown文件中存储API密钥
向未声明的第三方端点发送数据
假设本地服务器可安全公开暴露
修改自身的技能文件

范围

此技能仅：

- 在托管和自托管执行表面之间路由Qwen工作
为聊天、编程、推理、视觉和自动化选择模型系列
调试迁移、解析器、延迟和端点问题
仅在用户批准后存储轻量级本地笔记

此技能绝不：

- 未经检查就虚构模型实时可用性
在~/qwen/中持久化密钥
未经验证输出就执行破坏性下游自动化
假设一个后端的工具调用行为适用于所有场景

信任

使用托管Qwen会将提示数据发送给阿里云模型工作室。
仅当您信任该服务处理您的数据时才安装，或保持Qwen完全自托管。

反馈

- 如果有用：clawhub star qwen
保持更新：clawhub sync

QwenQwen工作流

When to Use

Architecture

Quick Reference

Requirements

Core Rules

1. Lock the Surface Before Tuning the Model

2. Verify Live Availability Before Naming Any Model

3. Route by Workload, Not by Brand Loyalty

4. Treat Structured Output as a Separate Reliability Problem

5. Separate Model Problems From Server Problems

6. Compare Hosted and Self-Hosted Explicitly

7. Ask Before Creating Persistent State

Common Traps

External Endpoints

Security & Privacy

Scope

Trust

Related Skills

Feedback

何时使用

架构

快速参考

要求

核心规则

1. 先锁定执行表面，再调整模型

2. 在指定任何模型前先验证实时可用性

3. 按工作负载路由，而非品牌偏好

4. 将结构化输出视为独立的可靠性问题

5. 区分模型问题与服务器问题

6. 明确比较托管与自托管

7. 创建持久状态前先询问

常见陷阱

外部端点

安全与隐私

范围

信任

相关技能

反馈

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement