FunASR Transcribe

Local speech-to-text for audio files using FunASR. It is best suited to Chinese and mixed Chinese-English audio, runs on the local machine, and does not require a paid transcription API.

When to Use

- The user wants to transcribe .wav, .ogg, .mp3, .flac, or .m4a files into text.
The user prefers local ASR over cloud speech APIs for privacy, cost, or offline-friendly workflows.
The audio is primarily Chinese, dialect-heavy Chinese, or mixed Chinese-English.
The user is okay with installing Python dependencies and downloading models on first use.

Do not use this skill when the user explicitly forbids local dependency installation or any network access for dependency/model download.

Quick Start

CODEBLOCK0

What It Does

- Creates a Python virtual environment at ~/.openclaw/workspace/funasr_env by default.
Installs funasr, torch, torchaudio, modelscope, and related dependencies.
Loads FunASR models locally and writes the transcript to a sibling .txt file.
Prints the transcript to stdout for direct CLI use.

Models

- ASR: INLINECODE11
VAD: INLINECODE12
Punctuation: INLINECODE13

External Endpoints

Endpoint	Purpose	Data sent
INLINECODE14	Install Python packages during setup	Package names and installer metadata requested by INLINECODE15
ModelScope and/or Hugging Face endpoints used by FunASR dependencies

Download model files on first run | Model identifiers and standard HTTP request metadata |

Security & Privacy

- Audio files are read from the local machine and processed locally by FunASR.
The transcription flow does not intentionally upload audio content to a cloud ASR API.
Network access is still required during setup and first-run model download.
The generated transcript is written to a local .txt file next to the source audio unless the write step fails.
This skill does not require API keys or other secrets by default.

Model Invocation Note

Autonomous invocation is normal for this skill. If a user asks to transcribe local audio, an agent may install dependencies and run the helper scripts unless the user explicitly opts out of dependency installation or network access.

Trust Statement

By using this skill, package and model downloads may be fetched from third-party upstream sources such as the configured PyPI mirror and model hosting providers. Only install and use this skill if you trust those upstream sources.

Troubleshooting

- python3 not found: install Python 3.7+ and rerun scripts/install.sh.
Install fails in the existing environment: rerun scripts/install.sh --force to recreate the virtual environment.
First transcription is slow: initial model downloads can take several minutes.
GPU is desired: edit scripts/transcribe.py and change device="cpu" to a CUDA device after installing the correct CUDA build.

FunASR 转录

使用 FunASR 对音频文件进行本地语音转文字。该工具最适合中文及中英文混合音频，在本地机器上运行，无需付费转录 API。

适用场景

- 用户希望将 .wav、.ogg、.mp3、.flac 或 .m4a 文件转录为文本。
出于隐私、成本或离线工作流程的考虑，用户更倾向于使用本地 ASR 而非云端语音 API。
音频主要为中文、方言口音较重的中文或中英文混合内容。
用户接受首次使用时安装 Python 依赖项并下载模型。

当用户明确禁止安装本地依赖项或禁止任何用于依赖项/模型下载的网络访问时，请勿使用此技能。

快速开始

bash

安装依赖项并创建虚拟环境

bash ~/.openclaw/workspace/skills/funasr-transcribe/scripts/install.sh

转录音频文件

bash ~/.openclaw/workspace/skills/funasr-transcribe/scripts/transcribe.sh /path/to/audio.ogg

功能说明

- 默认在 ~/.openclaw/workspace/funasr_env 创建 Python 虚拟环境。
安装 funasr、torch、torchaudio、modelscope 及相关依赖项。
在本地加载 FunASR 模型，并将转录结果写入同目录下的 .txt 文件。
将转录结果输出到标准输出，便于直接命令行使用。

模型

- ASR：damo/speechparaformer-largeasrnat-zh-cn-16k-common-vocab8404-pytorch
VAD：damo/speechfsmnvadzh-cn-16k-common-pytorch
标点：damo/puncct-transformerzh-cn-common-vocab272727-pytorch

外部端点

端点	用途	发送的数据
https://pypi.tuna.tsinghua.edu.cn/simple	安装 Python 包	pip 请求的包名和安装器元数据
FunASR 依赖项使用的 ModelScope 和/或 Hugging Face 端点

首次运行时下载模型文件 | 模型标识符和标准 HTTP 请求元数据 |

安全与隐私

- 音频文件从本地机器读取，并由 FunASR 在本地处理。
转录流程不会故意将音频内容上传到云端 ASR API。
安装和首次运行模型下载时仍需网络访问。
生成的转录结果将写入源音频文件所在目录的本地 .txt 文件，除非写入步骤失败。
此技能默认不需要 API 密钥或其他机密信息。

模型调用说明

此技能可正常进行自主调用。如果用户要求转录本地音频，代理可以安装依赖项并运行辅助脚本，除非用户明确选择不安装依赖项或不允许网络访问。

信任声明

使用此技能时，可能会从第三方上游源（如配置的 PyPI 镜像和模型托管提供商）获取包和模型下载。仅当您信任这些上游源时才安装和使用此技能。

故障排除

- 找不到 python3：安装 Python 3.7+ 并重新运行 scripts/install.sh。
在现有环境中安装失败：重新运行 scripts/install.sh --force 以重新创建虚拟环境。
首次转录速度慢：初始模型下载可能需要几分钟。
需要使用 GPU：编辑 scripts/transcribe.py，在安装正确的 CUDA 版本后将 device=cpu 更改为 CUDA 设备。

funasr-transcribeFunASR语音转写