返回顶部
c

cosyvoice3

|

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
592
下载量
1
收藏
概述
安装方式
版本历史

cosyvoice3

# CosyVoice3 TTS Local text-to-speech using Alibaba's CosyVoice3 on macOS Apple Silicon. ## Overview CosyVoice3 is an advanced TTS system based on large language models, supporting: - **9 languages**: Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian - **18+ Chinese dialects**: Cantonese, Sichuan, Dongbei, Shanghai, etc. - **Zero-shot voice cloning**: Clone any voice from 3-10 seconds of audio - **Cross-lingual synthesis**: Speak Chinese with English voice or vice versa - **Fine-grained control**: Emotions, speed, volume via text tags ## Prerequisites - macOS with Apple Silicon (M1/M2/M3) - Python 3.10 - Conda installed - ~5GB disk space for models ## Installation Run the installation script: ```bash cd /Users/lhz/.openclaw/workspace/skills/cosyvoice3/scripts bash install.sh ``` This will: 1. Create conda environment `cosyvoice` 2. Install PyTorch (CPU version for Apple Silicon) 3. Install CosyVoice dependencies 4. Download Fun-CosyVoice3-0.5B model (~2GB) ## Usage ### Quick Start - Basic TTS **重要**:CosyVoice3 需要在参考文本中添加 `<|endofprompt|>` 标记! ```bash cd /Users/lhz/.openclaw/workspace/cosyvoice3-repo export PATH="$HOME/miniconda3/bin:$PATH" conda activate cosyvoice python -c " import sys sys.path.append('third_party/Matcha-TTS') from cosyvoice.cli.cosyvoice import AutoModel import torchaudio cosyvoice = AutoModel(model_dir='pretrained_models/Fun-CosyVoice3-0.5B') for i, j in enumerate(cosyvoice.inference_zero_shot( '你好,这是CosyVoice3语音合成测试。', '希望你以后能够做的比我还好呦。<|endofprompt|>', # 注意这个标记! 'asset/zero_shot_prompt.wav' )): torchaudio.save('output.wav', j['tts_speech'], cosyvoice.sample_rate) print('Generated: output.wav') " ``` ### Using the TTS Script Generate speech from text: ```bash cd /Users/lhz/.openclaw/workspace/skills/cosyvoice3/scripts conda activate cosyvoice # Basic TTS with default voice python tts.py "你好,这是一个测试。" # With custom reference audio for voice cloning python tts.py "你好,这是克隆的声音。" --reference /path/to/reference.wav # Cross-lingual (English text with Chinese voice) python tts.py "Hello, this is cross-lingual synthesis." --reference asset/zero_shot_prompt.wav --lang en # With speed control python tts.py "这是一段快速的语音。" --speed 1.5 # Save to specific path python tts.py "你好。" --output ~/Desktop/greeting.wav ``` ### Available Assets Reference audio files in `cosyvoice3-repo/asset/`: - `zero_shot_prompt.wav` - Default Chinese female voice - `cross_lingual_prompt.wav` - English prompt for cross-lingual ## Advanced Features ### Voice Cloning Clone a voice from 3-10 seconds of reference audio: ```python from cosyvoice.cli.cosyvoice import AutoModel import torchaudio cosyvoice = AutoModel(model_dir='pretrained_models/Fun-CosyVoice3-0.5B') # Clone voice and generate for i, j in enumerate(cosyvoice.inference_zero_shot( '这是克隆后的声音在说话。', 'Reference text transcription', '/path/to/reference.wav' )): torchaudio.save('cloned.wav', j['tts_speech'], cosyvoice.sample_rate) ``` ### Fine-Grained Control Control prosody with special tags: ```python # Add laughter "他突然[laughter]笑了起来[laughter]。" # Add breathing "他说完这句话[breath],深吸一口气。" # Strong emphasis "这是<strong>非常重要</strong>的。" # Combined "在面对挑战时,他展现了非凡的<strong>勇气</strong>与<strong>智慧</strong>[breath]。" ``` ### Dialect Support Use instruct mode for dialects: ```python cosyvoice = AutoModel(model_dir='pretrained_models/CosyVoice-300M-Instruct') for i, j in enumerate(cosyvoice.inference_instruct( '你好,这是测试语音。', '中文男', '用四川话说这句话<|endofprompt|>' )): torchaudio.save('sichuan.wav', j['tts_speech'], cosyvoice.sample_rate) ``` ## Troubleshooting ### Model not found If you get "model not found" errors, download models manually: ```bash cd /Users/lhz/.openclaw/workspace/cosyvoice3-repo export PATH="$HOME/miniconda3/bin:$PATH" conda activate cosyvoice python -c " from modelscope import snapshot_download snapshot_download('FunAudioLLM/Fun-CosyVoice3-0.5B-2512', local_dir='pretrained_models/Fun-CosyVoice3-0.5B') " ``` ### Memory issues For long text, split into sentences: ```python text = "很长的文本..." sentences = text.split('。') for sent in sentences: if sent.strip(): # Process each sentence ``` ### Audio format Reference audio requirements: - Format: WAV, MP3 - Sample rate: 16kHz+ (automatically resampled) - Duration: 3-10 seconds optimal - Content: Clear speech, minimal background noise ## Resources ### Scripts - `install.sh` - Installation script for macOS - `tts.py` - Main TTS script with CLI interface - `download_models.py` - Download pretrained models ### References - [CosyVoice GitHub](https://github.com/FunAudioLLM/CosyVoice) - [Fun-CosyVoice3 Demo](https://funaudiollm.github.io/cosyvoice3/) ### Model Files Located in `cosyvoice3-repo/pretrained_models/`: - `Fun-CosyVoice3-0.5B/` - Main model (recommended) - `CosyVoice2-0.5B/` - Previous version - `CosyVoice-300M/` - Lighter model - `CosyVoice-300M-SFT/` - SFT version - `CosyVoice-300M-Instruct/` - Instruct version ## Notes - First inference takes ~30 seconds (model warmup) - Subsequent inferences are faster - Apple Silicon uses CPU mode (no CUDA) - RTF (real-time factor) ~0.3-0.5 on M-series chips - Model files are cached locally after first download

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 cosyvoice3-macos-1776419987 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 cosyvoice3-macos-1776419987 技能

通过命令行安装

skillhub install cosyvoice3-macos-1776419987

下载 Zip 包

⬇ 下载 cosyvoice3 v1.0.0

文件大小: 8.1 KB | 发布时间: 2026-4-17 20:08

v1.0.0 最新 2026-4-17 20:08
Initial release: Alibaba CosyVoice3 TTS for macOS Apple Silicon. Supports Chinese, English, 18+ dialects, zero-shot voice cloning.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部