kernelgen-flagos — Unified GPU Operator Generation Skill

This is a unified entry point that bundles four sub-skills into one:

Sub-skill file	Purpose
INLINECODE0	Generate GPU kernels for any Python/Triton repository
INLINECODE1

All sub-skill files are located in the same directory as this SKILL.md file.

Routing Protocol — Follow This BEFORE Doing Anything Else

Phase 1: Detect Repository Type

Use the Glob tool to check for project identity files in the current working directory:

CODEBLOCK0

Then use the Read tool to read whichever file exists. Determine the project name from
the file contents (e.g., name = "flag_gems" in pyproject.toml, or name='vllm' in setup.py).

Also use the Glob tool to check for characteristic directory structures:

FlagGems indicators (match ANY):

- src/flag_gems/ directory exists
Project name is flag_gems or flag-gems or INLINECODE10
INLINECODE11 appears in test files

vLLM indicators (match ANY):

- vllm/ directory exists at the repo root (with vllm/__init__.py)
Project name is INLINECODE14
INLINECODE15 directory exists alongside INLINECODE16

Phase 2: Dispatch to Sub-skill

Based on the detection result, use the Read tool to read the appropriate sub-skill file
from this skill's directory, then follow the instructions in that file exactly.

To locate the sub-skill files: They are in the same directory as this SKILL.md. Use the
Glob tool to find the path:

CODEBLOCK1

Then use the Read tool to read the matched path.

Decision Table

Detection Result	Action
FlagGems repository detected	Read `kernelgen-for-flaggems.md` and follow it
vLLM repository detected

Read kernelgen-for-vllm.md and follow it | | Neither detected (or unknown) | Read kernelgen-general.md and follow it | | User reports a bug or requests feedback submission | Read kernelgen-submit-feedback.md and follow it |

Important rules:

1. Always detect first, dispatch second. Never skip detection.
Read the entire sub-skill file before starting execution — do not partially read it.
Follow the sub-skill instructions exactly as if they were the main SKILL.md. All steps,

rules, and protocols in the sub-skill apply fully.

4. Do not mix sub-skills. Once you dispatch to a sub-skill, follow it to completion.
If the user explicitly requests a specific sub-skill (e.g., "use the FlagGems version"),

honor that request regardless of auto-detection results.

6. CRITICAL — MCP is mandatory: ALL operator code generation MUST go through the

mcp__kernelgen-mcp__generate_operator MCP tool. NEVER generate Triton kernels, PyTorch
wrappers, or operator implementations yourself. If MCP is not configured, not reachable,
or fails after all retries, STOP and report the issue — do NOT fall back to writing code
manually.

Phase 3: Feedback Handling

At any point during the workflow, if the user reports a bug, says something is broken,
or asks to submit feedback about the skill:

1. Use the Read tool to read kernelgen-submit-feedback.md from this skill's directory.
Follow the feedback submission workflow described in that file.
After feedback is submitted, ask the user if they want to continue with the operator

generation workflow or stop.

Quick Reference for Users

CODEBLOCK2

If you encounter any issues during generation, just say "submit feedback" or "report a bug"
and the skill will guide you through the feedback submission process.

kernelgen-flagos — 统一GPU算子生成技能

这是一个统一入口，将四个子技能整合为一个：

子技能文件	用途
kernelgen-general.md	为任意Python/Triton仓库生成GPU内核
kernelgen-for-flaggems.md

所有子技能文件均位于与此SKILL.md文件相同的目录中。

路由协议 — 在执行任何操作前请遵循此协议

阶段一：检测仓库类型

使用Glob工具检查当前工作目录中的项目标识文件：

Glob: pyproject.toml
Glob: setup.py
Glob: setup.cfg

然后使用Read工具读取存在的文件。根据文件内容确定项目名称（例如，pyproject.toml中的name = flag_gems，或setup.py中的name=vllm）。

同时使用Glob工具检查特征性目录结构：

FlagGems标志（匹配任意一项）：

- 存在src/flaggems/目录
项目名称为flaggems、flag-gems或FlagGems
测试文件中出现import flag_gems

vLLM标志（匹配任意一项）：

- 仓库根目录下存在vllm/目录（包含vllm/init.py）
项目名称为vllm
存在csrc/目录且与vllm/同级

阶段二：分发至子技能

根据检测结果，使用Read工具从本技能目录读取相应的子技能文件，然后严格遵循该文件中的指令。

定位子技能文件：它们与此SKILL.md位于同一目录。使用Glob工具查找路径：

Glob: /skills/kernelgen-flagos/kernelgen-general.md

然后使用Read工具读取匹配到的路径。

决策表

检测结果	操作
检测到FlagGems仓库	读取kernelgen-for-flaggems.md并遵循其内容
检测到vLLM仓库

重要规则：

1. 始终先检测，后分发。 切勿跳过检测步骤。
在开始执行前完整读取子技能文件 — 不要部分读取。
严格遵循子技能指令，如同它们是主SKILL.md一样。子技能中的所有步骤、规则和协议均完全适用。
不要混合使用子技能。 一旦分发至某个子技能，就完整执行至结束。
如果用户明确要求使用特定子技能（例如“使用FlagGems版本”），则无论自动检测结果如何，均尊重该请求。
关键 — MCP为强制要求：所有算子代码生成必须通过mcpkernelgen-mcpgenerate_operator MCP工具进行。切勿自行生成Triton内核、PyTorch封装或算子实现。如果MCP未配置、无法访问或在所有重试后仍然失败，则停止并报告问题 — 不要回退到手动编写代码。

阶段三：反馈处理

在工作流程的任何时刻，如果用户报告错误、指出某些功能异常或请求提交关于本技能的反馈：

1. 使用Read工具从本技能目录读取kernelgen-submit-feedback.md。
遵循该文件中描述的反馈提交工作流程。
反馈提交后，询问用户是否希望继续算子生成工作流程或停止。

用户快速参考

bash

生成一个内核算子（自动检测仓库类型）

/kernelgen-flagos relu

显式指定函数类型进行生成

/kernelgen-flagos rms_norm --func-type normalization

本技能将自动：

- 检测是否在FlagGems仓库中 → 使用FlagGems专用工作流程

- 检测是否在vLLM仓库中 → 使用vLLM专用工作流程

- 否则 → 使用通用工作流程

如果在生成过程中遇到任何问题，只需说“提交反馈”或“报告错误”，本技能将引导您完成反馈提交流程。

kernelgen-flagos内核生成标志