Pronunciation Coach

Analyze spoken English pronunciation using Azure Speech Services and provide actionable coaching feedback.

Privacy Note: This skill reads local voice messages from ~/.openclaw/media/inbound/ and transmits them to Microsoft Azure Speech Services for processing.

Prerequisites

- Azure Speech API Key: Set AZURE_SPEECH_KEY env var
Azure Speech Region: Set AZURE_SPEECH_REGION env var (e.g., southeastasia)
ffmpeg: Required for audio format conversion (must be on PATH)
Node.js: Required for report generation

Workflow

1. Receive Audio

Voice messages from Telegram are stored in ~/.openclaw/media/inbound/. Find the latest .ogg file matching the message timestamp.

CODEBLOCK0

2. Run Assessment

CODEBLOCK1

- audio_file: Path to the voice message (ogg/wav/mp3/m4a)
INLINECODE7: What the speaker intended to say (from transcript)
The script auto-converts any format to WAV 16kHz mono

3. Generate Report

Pipe the JSON output into the report generator:

CODEBLOCK2

The report includes:

- Overall scores (Pronunciation, Accuracy, Fluency, Prosody, Completeness)
Word-by-word breakdown with per-phoneme scores
Problem sounds highlighted
Verdict with actionable next steps

4. Provide Coaching

After generating the report:

1. Send the text report to the user (scores + word breakdown)
Identify top 3 problem sounds from the phoneme scores
Explain each problem — what the correct sound is and how to produce it

- See references/phoneme-guide.md for phoneme descriptions and fixes

4. Send a voice message (via TTS) demonstrating the correct pronunciation of problem words
Assign practice — give the user specific sentences to re-record focusing on weak sounds

Coaching Tips

- Scores ≥ 90: Excellent, minor polish
Scores 70-89: Good, targeted practice needed
Scores < 70: Needs focused drill on that specific sound
"Omission" errors mean the word wasn't detected — speaker may have been too quiet or mumbled
Prosody score < 85 suggests monotone delivery — coach on intonation rises/falls
Compare scores across multiple recordings to track improvement

发音教练

使用 Azure 语音服务分析英语口语发音，并提供可操作的辅导反馈。

隐私说明：此技能从 ~/.openclaw/media/inbound/ 读取本地语音消息，并将其传输至 Microsoft Azure 语音服务进行处理。

前置条件

- Azure 语音 API 密钥：设置 AZURESPEECHKEY 环境变量
Azure 语音区域：设置 AZURESPEECHREGION 环境变量（例如 southeastasia）
ffmpeg：音频格式转换所需（必须在 PATH 中）
Node.js：报告生成所需

工作流程

1. 接收音频

来自 Telegram 的语音消息存储在 ~/.openclaw/media/inbound/ 中。查找与消息时间戳匹配的最新 .ogg 文件。

bash
ls -lt ~/.openclaw/media/inbound/*.ogg | head -5

2. 运行评估

bash
scripts/pronunciation-assess.sh <音频文件> <参考文本>

- 音频文件：语音消息的路径（ogg/wav/mp3/m4a）
参考文本：说话者意图表达的内容（来自转录文本）
脚本会自动将任何格式转换为 16kHz 单声道 WAV

3. 生成报告

将 JSON 输出通过管道传递给报告生成器：

bash
scripts/pronunciation-assess.sh audio.ogg 参考文本 | node scripts/pronunciation-report.js

报告包含：

- 总体评分（发音、准确度、流利度、韵律、完整性）
逐词分解及每个音素的评分
问题发音高亮显示
结论及可操作的下步建议

4. 提供辅导

生成报告后：

1. 向用户发送文本报告（评分 + 单词分解）
从音素评分中识别出前 3 个问题发音
解释每个问题——正确发音是什么以及如何发出该音

- 音素描述和修正方法请参阅 references/phoneme-guide.md

4. 发送语音消息（通过 TTS）演示问题单词的正确发音
布置练习——给用户指定具体句子，要求其重新录制，重点练习薄弱发音

辅导技巧

- 评分 ≥ 90：优秀，只需微调
评分 70-89：良好，需要针对性练习
评分 < 70：需要对该特定发音进行集中训练
遗漏错误表示未检测到该单词——说话者可能声音太小或含糊不清
韵律评分 < 85 表明语调平淡——辅导时注意语调的升降变化
比较多次录音的评分以追踪进步情况

pronunciation-coach发音教练