Willow Inference Server Skill

Local ASR (speech-to-text) and TTS (text-to-speech) inference server.

Setup

1. Start Willow Inference Server

CODEBLOCK0

Server runs at INLINECODE0

2. Configure Environment

Set the server URL: CODEBLOCK1

Or configure per request (see below).

ASR (Speech-to-Text)

Transcribe Audio File

CODEBLOCK2

Parameters
Parameter Description Default
audio_file Audio file to transcribe required
language
Language code (en, zh, etc.) or "auto" | auto |

Parameter	Description	Default
audio_file	Audio file to transcribe	required
language

Supported Formats

- MP3, WAV, M4A, OGG, FLAC, WebM

Example: Transcribe with curl

CODEBLOCK3

TTS (Text-to-Speech)

Convert Text to Speech

CODEBLOCK4

Parameters
Parameter Description Default
text Text to convert to speech required
voice
Voice ID (see below) | default voice |

Parameter	Description	Default
text	Text to convert to speech	required
voice

Available Voices

Common voices (format: gender_voicename):

- af_sarah - Sarah (Female)
INLINECODE3 - Bella (Female)
INLINECODE4 - Michael (Male)
INLINECODE5 - Alex (Male)

Check server docs for full list: INLINECODE6

Example: TTS with curl

CODEBLOCK5

Environment Variables

Variable	Description	Default
WILLOWBASEURL	Server URL	https://localhost:19000

Workflow Examples

1. Record and Transcribe

CODEBLOCK6

2. Text to Speech

CODEBLOCK7

3. Batch Transcription

CODEBLOCK8

API Documentation

Full API docs available at: INLINECODE7

Notes

- All endpoints require HTTPS (or HTTP if configured)
Audio files are processed locally on the server
ASR latency depends on model size and hardware
TTS voices can be customized with custom voice recordings

Willow 推理服务器技能

本地 ASR（语音转文字）和 TTS（文字转语音）推理服务器。

设置

1. 启动 Willow 推理服务器

bash git clone https://github.com/toverainc/willow-inference-server.git cd willow-inference-server ./utils.sh install ./utils.sh gen-cert your-hostname ./utils.sh run

服务器运行在 https://your-hostname:19000

2. 配置环境

设置服务器 URL： bash export WILLOWBASEURL=https://your-hostname:19000

或按请求配置（见下文）。

ASR（语音转文字）

转录音频文件

bash curl -X POST ${WILLOWBASEURL}/api/asr \ -F audio_file=@/path/to/audio.m4a \ -F language=auto

参数
参数描述默认值
audio_file 要转录音频文件必填
language
语言代码（en、zh 等）或 auto | auto |

参数	描述	默认值
audio_file	要转录音频文件	必填
language

支持的格式

- MP3、WAV、M4A、OGG、FLAC、WebM

示例：使用 curl 进行转录

bash

基础转录

curl -X POST ${WILLOWBASEURL}/asr \ -F audio_file=@recording.m4a \ -F language=zh

指定模型

curl -X POST ${WILLOWBASEURL}/asr \ -F audio_file=@meeting.mp3 \ -F language=en \ -F model=base

TTS（文字转语音）

将文字转换为语音

bash curl -X POST ${WILLOWBASEURL}/tts \ -H Content-Type: application/json \ -d {text: Hello world, voice: af_sarah}

参数
参数描述默认值
text 要转换为语音的文字必填
voice
语音 ID（见下文） | 默认语音 |

参数	描述	默认值
text	要转换为语音的文字	必填
voice

| speed | 语速（0.5-2.0） | 1.0 | | volume | 音量（0.0-1.0） | 1.0 |

可用语音

常用语音（格式：性别_语音名称）：

- afsarah - Sarah（女声）
afbella - Bella（女声）
ammichael - Michael（男声）
amalex - Alex（男声）

查看服务器文档获取完整列表：${WILLOWBASEURL}/api/docs

示例：使用 curl 进行 TTS

bash

基础 TTS

curl -X POST ${WILLOWBASEURL}/tts \ -H Content-Type: application/json \ -d {text: 你好，这是测试} \ -o output.wav

自定义语音

curl -X POST ${WILLOWBASEURL}/tts \ -H Content-Type: application/json \ -d {text: Hello!, voice: am_michael, speed: 1.2} \ -o hello.mp3

环境变量

变量	描述	默认值
WILLOWBASEURL	服务器 URL	https://localhost:19000

工作流示例

1. 录制并转录

bash

录制音频（macOS）

rec test.wav

转录

curl -X POST ${WILLOWBASEURL}/asr \ -F audio_file=@test.wav \ -F language=auto

2. 文字转语音

bash

将文字转换为语音

curl -X POST ${WILLOWBASEURL}/tts \ -H Content-Type: application/json \ -d {text: 今天的任务是学习新技能} \ -o speech.wav

3. 批量转录

bash for f in *.m4a; do curl -X POST ${WILLOWBASEURL}/asr \ -F audio_file=@$f \ -F language=auto \ -o ${f%.m4a}.txt done

API 文档

完整 API 文档位于：${WILLOWBASEURL}/api/docs

注意事项

- 所有端点都需要 HTTPS（或配置为 HTTP）
音频文件在服务器本地处理
ASR 延迟取决于模型大小和硬件
TTS 语音可通过自定义语音录音进行定制

willow-inference-server柳树推理服务