MUSA Torch Coding
Guide for generating PyTorch code that runs on Moore Threads (摩尔线程) MUSA GPUs using torch_musa.
Overview
MUSA (Metaverse Unified System Architecture) is Moore Threads' GPU computing platform. This skill helps generate code that:
- - Runs on Moore Threads GPUs via INLINECODE0
- Converts CUDA code to MUSA-compatible code
- Sets up proper environments (conda v1.2/v1.3)
- Follows MUSA best practices
Key Differences: CUDA vs MUSA
| CUDA | MUSA |
|---|
| INLINECODE1 | INLINECODE2 |
| INLINECODE3 |
torch.device("musa") |
|
torch.cuda.is_available() |
torch.musa.is_available() |
|
backend='nccl' |
backend='mccl' |
|
torch.cuda.device_count() |
torch.musa.device_count() |
|
torch.cuda.get_device_name() |
torch.musa.get_device_name() |
Environment Setup
⚠️ Important: MUSA Uses Pre-configured Conda Environments
DO NOT install PyTorch, vLLM, or related packages manually. MUSA environments are custom-built and include:
- - MUSA-specific PyTorch builds (not compatible with standard PyTorch)
- MUSA-customized vLLM versions
- MUSA drivers and SDK integration
Installing standard packages from PyPI will break the environment.
Conda Environment (v1.2/v1.3)
MUSA provides pre-configured conda environments. Common environment names:
- -
v1.2 - MUSA SDK v1.2 environment - INLINECODE14 - MUSA SDK v1.3 environment (newer)
CODEBLOCK0
Environment Detection & Setup
If no MUSA conda environment is detected:
- 1. Check if MUSA is installed:
CODEBLOCK1
- 2. If MUSA is not set up:
- Use the musa-env-setup skill for complete environment installation
- The skill covers SDK installation, conda setup, and vLLM-MUSA configuration
- 3. Common conda environment locations:
- /opt/conda/envs/
- ~/conda/envs/
- INLINECODE18
Key Environment Variables
| Variable | Purpose |
|---|
| INLINECODE19 | Control visible GPU IDs |
| INLINECODE20 |
Synchronous kernel launch |
|
MUDNN_LOG_LEVEL=INFO | Enable MUDNN logging |
|
TORCH_SHOW_CPP_STACKTRACES=1 | Show C++ stack traces |
Code Generation Rules
When generating PyTorch code for MUSA:
- 1. Always import torch_musa
CODEBLOCK2
- 2. Use torch.device("musa")
CODEBLOCK3
- 3. Use 'mccl' for distributed training
CODEBLOCK4
- 4. Mixed precision (AMP) is supported
CODEBLOCK5
- 5. TensorCore optimization available
- Set torch.backends.musa.matmul.allow_tf32 = True for TensorFloat32
Model Templates
For common model types, see templates in references/:
- -
reference.md - Complete MUSA API reference
Common Tasks
Check GPU Availability
CODEBLOCK6
Training Loop Pattern
CODEBLOCK7
Distributed Training (DDP)
CODEBLOCK8
Code Conversion
When converting existing CUDA code to MUSA:
- 1. Add
import torch_musa at the top - Replace
cuda with musa in device strings - Replace
nccl with mccl for distributed backend - Keep all other PyTorch API calls unchanged
Troubleshooting
- - Device not found: Ensure user is in
render group: INLINECODE32 - Library not found: Check
LD_LIBRARY_PATH includes INLINECODE34 - Build issues: Clean and rebuild: INLINECODE35
- Docker issues: Use INLINECODE36
Reference
For detailed API reference and examples, see references/reference.md.
MUSA Torch 编码
使用 torch_musa 生成可在摩尔线程 MUSA GPU 上运行的 PyTorch 代码的指南。
概述
MUSA(元宇宙统一系统架构)是摩尔线程的 GPU 计算平台。本技能帮助生成以下代码:
- - 通过 torch_musa 在摩尔线程 GPU 上运行
- 将 CUDA 代码转换为 MUSA 兼容代码
- 设置适当的环境(conda v1.2/v1.3)
- 遵循 MUSA 最佳实践
主要区别:CUDA vs MUSA
| CUDA | MUSA |
|---|
| torch.cuda | torch.musa |
| torch.device(cuda) |
torch.device(musa) |
| torch.cuda.is
available() | torch.musa.isavailable() |
| backend=nccl | backend=mccl |
| torch.cuda.device
count() | torch.musa.devicecount() |
| torch.cuda.get
devicename() | torch.musa.get
devicename() |
环境设置
⚠️ 重要提示:MUSA 使用预配置的 Conda 环境
请勿手动安装 PyTorch、vLLM 或相关包。 MUSA 环境是定制构建的,包含:
- - MUSA 特定的 PyTorch 构建(与标准 PyTorch 不兼容)
- MUSA 定制的 vLLM 版本
- MUSA 驱动和 SDK 集成
从 PyPI 安装标准包会破坏环境。
Conda 环境(v1.2/v1.3)
MUSA 提供预配置的 conda 环境。常见环境名称:
- - v1.2 - MUSA SDK v1.2 环境
- v1.3 - MUSA SDK v1.3 环境(较新)
bash
列出可用的 MUSA 环境
conda env list | grep -E (v1\.2|v1\.3|musa)
激活适当的环境
conda activate v1.2 # 或 v1.3
验证 MUSA 可用性
python -c import torch
musa; import torch; print(torch.musa.isavailable())
环境检测与设置
如果未检测到 MUSA conda 环境:
- 1. 检查 MUSA 是否已安装:
bash
which musaInfo # 应显示 musaInfo 路径
ls /usr/local/musa/ # MUSA SDK 位置
- 2. 如果 MUSA 未设置:
- 使用 musa-env-setup 技能 进行完整环境安装
- 该技能涵盖 SDK 安装、conda 设置和 vLLM-MUSA 配置
- 3. 常见的 conda 环境位置:
- /opt/conda/envs/
- ~/conda/envs/
- /usr/local/conda/envs/
关键环境变量
| 变量 | 用途 |
|---|
| MUSAVISIBLEDEVICES=0,1,2,3 | 控制可见的 GPU ID |
| MUSALAUNCHBLOCKING=1 |
同步内核启动 |
| MUDNN
LOGLEVEL=INFO | 启用 MUDNN 日志记录 |
| TORCH
SHOWCPP_STACKTRACES=1 | 显示 C++ 堆栈跟踪 |
代码生成规则
生成用于 MUSA 的 PyTorch 代码时:
- 1. 始终导入 torch_musa
python
import torch_musa # 使用 torch.musa 前必须导入
- 2. 使用 torch.device(musa)
python
device = torch.device(musa) if torch.musa.is_available() else torch.device(cpu)
tensor = torch.tensor([1.0, 2.0], device=device)
- 3. 分布式训练使用 mccl
python
dist.initprocessgroup(backend=mccl, ...)
- 4. 支持混合精度(AMP)
python
from torch.cuda.amp import autocast, GradScaler # 相同 API
- 5. TensorCore 优化可用
- 设置 torch.backends.musa.matmul.allow_tf32 = True 启用 TensorFloat32
模型模板
常见模型类型请参见 references/ 中的模板:
- - reference.md - 完整的 MUSA API 参考
常见任务
检查 GPU 可用性
python
import torch
import torch_musa
print(fMUSA 可用: {torch.musa.is_available()})
print(f设备数量: {torch.musa.device_count()})
print(f设备名称: {torch.musa.getdevicename(0)})
训练循环模式
python
import torch_musa
设备设置
device = torch.device(musa) if torch.musa.is_available() else torch.device(cpu)
将模型和数据移至设备
model = model.to(device)
inputs = inputs.to(device)
训练(与 CUDA 相同)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
分布式训练(DDP)
python
import torch.distributed as dist
import torch_musa
使用 mccl 后端初始化
dist.init
processgroup(backend=mccl, rank=rank, world
size=worldsize)
在 MUSA 上创建进程组
torch.cuda.set
device(localrank) # torch_musa 扩展了 torch.cuda API
代码转换
将现有 CUDA 代码转换为 MUSA 时:
- 1. 在顶部添加 import torch_musa
- 将设备字符串中的 cuda 替换为 musa
- 将分布式后端的 nccl 替换为 mccl
- 保持所有其他 PyTorch API 调用不变
故障排除
- - 找不到设备:确保用户属于 render 组:sudo usermod -aG render $(whoami)
- 找不到库:检查 LDLIBRARYPATH 是否包含 /usr/local/musa/lib/
- 构建问题:清理并重新构建:python setup.py clean && bash build.sh
- Docker 问题:使用 --env MTHREADSVISIBLEDEVICES=all
参考
有关详细的 API 参考和示例,请参见 references/reference.md。