返回顶部
🇺🇸 English
🇨🇳 简体中文
🇨🇳 繁體中文
🇺🇸 English
🇯🇵 日本語
🇰🇷 한국어
🇫🇷 Français
🇩🇪 Deutsch
🇪🇸 Español
🇷🇺 Русский
s

sensevoice-transcribe

Transcribe audio files (WAV/MP3/M4A/FLAC) to timestamped text using SenseVoice-Small + FSMN-VAD. Supports single-file and batch mode with VAD-anchored per-segment timestamps (~15s granularity). Use when the user wants to transcribe speech/audio, run batch transcription on daylog recordings, or re-transcribe specific dates. Replaces the old whisper-transcribe skill.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
271
下载量
免费
免费
0
收藏
概述
安装方式
版本历史

sensevoice-transcribe

# SenseVoice Transcribe Transcribe audio to timestamped text using FunASR's `iic/SenseVoiceSmall` model with `fsmn-vad` for timestamp anchoring. ## Pipeline 1. **FSMN-VAD** segments audio into speech regions (~258 segments for 30min file) 2. **SenseVoice-Small** transcribes full audio with `merge_vad=True` 3. Raw text is split by `<|zh|>` tags → cleaned via `rich_transcription_postprocess()` 4. Text segments are **proportionally mapped** to VAD timestamps 5. Output: `[HH:MM:SS] text` per line, ~15s granularity ## Environment ``` Venv: ~/.openclaw/venvs/sensevoice/ Python: 3.12 Key packages: funasr==1.3.1, modelscope, onnxruntime Model cache: ~/.cache/modelscope/hub/models/iic/SenseVoiceSmall VAD cache: ~/.cache/modelscope/hub/models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch ``` ### First-time Setup ```bash python3 -m venv ~/.openclaw/venvs/sensevoice source ~/.openclaw/venvs/sensevoice/bin/activate pip install funasr modelscope onnxruntime # Models auto-download on first run (~234MB SenseVoice + ~4MB VAD) ``` ## Usage ### Single File ```bash source ~/.openclaw/venvs/sensevoice/bin/activate python3 -c " from funasr import AutoModel from funasr.utils.postprocess_utils import rich_transcription_postprocess from datetime import datetime, timedelta import re wav = '<WAV_PATH>' # Parse start time from filename: TX01_MIC015_20260308_124130_orig.wav m = re.search(r'(\d{8})_(\d{6})', wav) start_dt = datetime.strptime(m.group(1)+m.group(2), '%Y%m%d%H%M%S') if m else None vad_model = AutoModel(model='fsmn-vad', disable_update=True) model = AutoModel(model='iic/SenseVoiceSmall', vad_model='fsmn-vad', vad_kwargs={'max_single_segment_time': 30000}, device='cpu') vad_segs = vad_model.generate(input=wav)[0].get('value', []) res = model.generate(input=wav, cache={}, language='zh', use_itn=True, batch_size_s=60, merge_vad=True) texts = [rich_transcription_postprocess(s).strip() for s in re.split(r'<\|zh\|>', res[0]['text']) if s.strip()] texts = [s for s in texts if len(s) > 1] ratio = len(vad_segs) / len(texts) if texts else 1 for i, t in enumerate(texts): vi = min(int(i * ratio), len(vad_segs)-1) ts = (start_dt + timedelta(milliseconds=vad_segs[vi][0])).strftime('%H:%M:%S') if start_dt else f'{vad_segs[vi][0]//1000:.0f}s' print(f'[{ts}] {t}') " ``` ### Batch Mode (daylog) The bundled `scripts/batch_transcribe.py` handles the full daylog pipeline: ```bash source ~/.openclaw/venvs/sensevoice/bin/activate cd ~/Documents/dec/daylog # Dry run — see what would be transcribed python3 scripts/batch_transcribe.py --dry-run # Transcribe all new files python3 scripts/batch_transcribe.py # Re-transcribe specific dates (deletes existing, then re-runs) python3 scripts/batch_transcribe.py --force-dates 2026-03-07,2026-03-08 # With progress file + Discord webhook python3 scripts/batch_transcribe.py \ --progress-file /tmp/daylog-progress.json \ --discord-webhook https://discord.com/api/webhooks/... ``` **Flags:** | Flag | Description | |------|-------------| | `--dry-run` | Preview without writing | | `--engine sensevoice\|whisper` | Engine (default: sensevoice) | | `--force-dates YYYY-MM-DD,...` | Delete & re-transcribe these dates | | `--progress-file PATH` | Write JSON progress for monitoring | | `--discord-webhook URL` | Post start/milestone/finish to Discord | **Directory layout:** ``` daylog/ ├── raw/ # WAV input (DJI MIC 3, 48kHz/32bit, ~247MB/30min) │ ├── TX01_MIC009_20260308_094129_orig.wav │ └── ... ├── transcripts/ # Output, grouped by date │ └── 2026-03-08/ │ ├── 000_TX01_MIC009_20260308_094129_orig.txt │ └── ... └── notes/ # Compiled daily notes (separate step) └── 2026-03-08.md ``` **Behavior:** - Groups WAV files by date extracted from filename (`YYYYMMDD`) - Sorts by timestamp within each date for correct chronological order - Skips already-transcribed files unless `--force-dates` - Indexed output filenames (`000_`, `001_`, ...) for sort order - Discord milestones every 25% progress ## Output Format ``` [录音开始: 09:41:29] [09:41:35] 到了,我们下车吧。 [09:41:48] 武康大楼,人好多啊。 [09:42:04] 你帮我在这里拍一张。 ... ``` ## Performance (Apple M4, 10-core CPU) | Metric | Value | |--------|-------| | RTF | ~0.04 (25x realtime) | | CPU | ~1.2 cores (12%) | | RAM | ~1.5GB | | 30min WAV | ~73s transcription + ~4s VAD | | Accuracy | 92% keyword accuracy (vs Whisper-medium 23%, turbo 38%) | | Hallucinations | 0 (vs Whisper hundreds per session) | | Model size | 234MB (vs Whisper-large-v3-turbo 1.5GB) | ## vs Old Whisper Skill | | Whisper (old) | SenseVoice (new) | |---|---|---| | Model | mlx-whisper-medium | SenseVoice-Small (234MB) | | Accuracy | 23-38% | 92% | | Hallucinations | Hundreds per session | 0 | | Timestamp | Per-word (~2-4s) | VAD-anchored (~15s) | | Duplicate lines | ~23% | <0.2% | | Chinese support | Weak | Native (Mandarin-optimized) | ## Emoji Note SenseVoice appends emotion tags (😊😔😡😮) to segments. These are **model artifacts** reflecting detected speech emotion, not literal emoji in the audio. Downstream consumers (note compilation) should ignore or strip them.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 sensevoice-transcribe-1776118420 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 sensevoice-transcribe-1776118420 技能

通过命令行安装

skillhub install sensevoice-transcribe-1776118420

下载

⬇ 下载 sensevoice-transcribe v1.0.0(免费)

文件大小: 9.16 KB | 发布时间: 2026-4-15 14:19

v1.0.0 最新 2026-4-15 14:19
Initial release: SenseVoice-Small + FSMN-VAD batch transcription with VAD-anchored timestamps

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部