ModelSense Skill

Purpose

ModelSense helps users pick the optimal model and effort level for their task.
It does NOT route automatically on every request (use a provider plugin for that).
It's an on-demand advisor: ask it a question, get a clear recommendation with reasoning.

When to trigger

- User asks: "which model for X?", "should I use Opus or Sonnet?", "what effort level?"
User wants to understand what a benchmark means
User wants ModelSense to auto-switch the session model

Inputs to collect (infer from context, ask only if truly unclear)

1. Task description — what is the user trying to do?
Effort preference (optional): quick / balanced / deep / INLINECODE3

- If not specified, infer from task urgency/complexity

3. Auto-switch? — does the user want ModelSense to apply the recommendation automatically?

Recommendation Process

Step 1 — Task Analysis

Classify the task across these dimensions:

- Domain: code, math, reasoning, writing, dialogue, document analysis, multimodal, research
Complexity: simple / moderate / complex / research-grade
Output type: text, code, JSON, long-form, structured data
Context length needed: short (<8K), medium (8–32K), long (32K+), very long (100K+)
Special requirements: function calling, thinking/CoT, multimodal, speed-sensitive

Step 2 — Benchmark Matching

Cross-reference task domain with relevant benchmarks from data/benchmarks.yaml.

Benchmark	Best for
HumanEval / SWE-bench	Code generation, debugging, engineering
GPQA

Step 3 — Effort × Model Matrix

Effort	Target quality	Typical model tier
INLINECODE5	Good enough, fast	Haiku / Flash / GLM
INLINECODE6

Step 4 — Provider Filter

Check the user's available providers:

- Run: openclaw models list via exec tool (or read from context)
Only recommend models the user can actually use
Flag when a top pick requires a provider they haven't configured

Step 5 — Output the Recommendation

Format:
CODEBLOCK0

Auto-Switch Behaviors

Option A: Advisory only (default)

Just output the recommendation. Tell user: "Run /model <name> to switch."

Option B: Switch current session

If user confirms or says "yes switch" / "apply it":

session_status(model="<provider/model>")

Notify user: "✅ Switched to X for this session. Run /model default to reset."

Option C: Delegate task to best model

If user says "just do it with the best model": CODEBLOCK2

Data Files

- data/benchmarks.yaml — benchmark definitions, score leaders, task mappings
INLINECODE13 — model catalog (updated via GitHub Actions weekly)

Examples

User: "I need to write a Solidity audit report"
→ Domain: code + security + long-form
→ Benchmarks: SWE-bench, HumanEval
→ Recommendation: claude-opus-4-6 with thinking=high, effort=INLINECODE16

User: "Quick summary of this Slack thread"
→ Domain: dialogue, short
→ Recommendation: claude-haiku-4-5 or gemini-flash, effort=INLINECODE19

User: "Prove this mathematical conjecture"
→ Domain: math, research-grade
→ Benchmarks: MATH, AIME, GPQA
→ Recommendation: o3 or claude-opus-4-6 with thinking=high, effort=INLINECODE23

ModelSense 技能

目的

ModelSense 帮助用户为其任务选择最佳模型和努力级别。
它不会自动路由每个请求（请使用提供者插件实现该功能）。
它是一个按需顾问：向它提问，即可获得带有推理依据的清晰建议。

触发时机

- 用户询问：X 任务该用哪个模型？、应该用 Opus 还是 Sonnet？、该用哪种努力级别？
用户想了解某个基准测试的含义
用户希望 ModelSense 自动切换会话模型

需要收集的输入（从上下文中推断，仅在确实不明确时询问）

1. 任务描述 — 用户想要做什么？
努力偏好（可选）：快速 / 均衡 / 深度 / 研究

- 如果未指定，则根据任务紧急程度/复杂性推断

3. 自动切换？ — 用户是否希望 ModelSense 自动应用推荐？

推荐流程

第一步 — 任务分析

从以下维度对任务进行分类：

- 领域：代码、数学、推理、写作、对话、文档分析、多模态、研究
复杂性：简单 / 中等 / 复杂 / 研究级
输出类型：文本、代码、JSON、长文本、结构化数据
所需上下文长度：短（<8K）、中（8–32K）、长（32K+）、超长（100K+）
特殊要求：函数调用、思考/思维链、多模态、速度敏感

第二步 — 基准匹配

将任务领域与 data/benchmarks.yaml 中的相关基准进行交叉参考。

基准	最佳适用场景
HumanEval / SWE-bench	代码生成、调试、工程
GPQA

第三步 — 努力 × 模型矩阵

努力级别	目标质量	典型模型层级
快速	足够好、速度快	Haiku / Flash / GLM
均衡

第四步 — 提供者过滤

检查用户可用的提供者：

- 通过 exec 工具运行：openclaw models list（或从上下文中读取）
仅推荐用户实际可用的模型
当最佳选择需要用户尚未配置的提供者时，进行标记

第五步 — 输出推荐

格式：

🎯 推荐：<模型>
⚡ 努力级别：<级别>
📊 原因：<1-2句基于基准的推理依据>
🔧 特殊设置：<开启思考？函数调用？等>
💰 成本估算：<粗略的 $/M 或相对值>

备选方案：
- <模型 B> — 如果你想要更快/更便宜
- <模型 C> — 如果你想要更高质量

自动切换行为

选项 A：仅提供建议（默认）

仅输出推荐。告知用户：运行 /model <名称> 进行切换。

选项 B：切换当前会话

如果用户确认或说是的，切换/应用它： python session_status(model=<提供者/模型>)

通知用户：✅ 已为此会话切换到 X。运行 /model default 重置。

选项 C：将任务委托给最佳模型

如果用户说直接用最佳模型做： python sessions_spawn( task=<原始任务>, model=<推荐模型>, thinking=<级别> )

数据文件

- data/benchmarks.yaml — 基准定义、得分领先者、任务映射
data/models.yaml — 模型目录（通过 GitHub Actions 每周更新）

示例

用户：我需要写一份 Solidity 审计报告
→ 领域：代码 + 安全 + 长文本
→ 基准：SWE-bench, HumanEval
→ 推荐：claude-opus-4-6 配合 thinking=high，努力级别=深度

用户：快速总结这个 Slack 讨论串
→ 领域：对话，短文本
→ 推荐：claude-haiku-4-5 或 gemini-flash，努力级别=快速

用户：证明这个数学猜想
→ 领域：数学，研究级
→ 基准：MATH, AIME, GPQA
→ 推荐：o3 或 claude-opus-4-6 配合 thinking=high，努力级别=研究

modelsense模型感知

modelsense

ModelSense Skill

Purpose

When to trigger

Inputs to collect (infer from context, ask only if truly unclear)