Pronunciation Coach
Analyze spoken English pronunciation using Azure Speech Services and provide actionable coaching feedback.
Privacy Note: This skill reads local voice messages from ~/.openclaw/media/inbound/ and transmits them to Microsoft Azure Speech Services for processing.
Prerequisites
- - Azure Speech API Key: Set
AZURE_SPEECH_KEY env var - Azure Speech Region: Set
AZURE_SPEECH_REGION env var (e.g., southeastasia) - ffmpeg: Required for audio format conversion (must be on PATH)
- Node.js: Required for report generation
Workflow
1. Receive Audio
Voice messages from Telegram are stored in ~/.openclaw/media/inbound/. Find the latest .ogg file matching the message timestamp.
CODEBLOCK0
2. Run Assessment
CODEBLOCK1
- -
audio_file: Path to the voice message (ogg/wav/mp3/m4a) - INLINECODE7 : What the speaker intended to say (from transcript)
- The script auto-converts any format to WAV 16kHz mono
3. Generate Report
Pipe the JSON output into the report generator:
CODEBLOCK2
The report includes:
- - Overall scores (Pronunciation, Accuracy, Fluency, Prosody, Completeness)
- Word-by-word breakdown with per-phoneme scores
- Problem sounds highlighted
- Verdict with actionable next steps
4. Provide Coaching
After generating the report:
- 1. Send the text report to the user (scores + word breakdown)
- Identify top 3 problem sounds from the phoneme scores
- Explain each problem — what the correct sound is and how to produce it
- See
references/phoneme-guide.md for phoneme descriptions and fixes
- 4. Send a voice message (via TTS) demonstrating the correct pronunciation of problem words
- Assign practice — give the user specific sentences to re-record focusing on weak sounds
Coaching Tips
- - Scores ≥ 90: Excellent, minor polish
- Scores 70-89: Good, targeted practice needed
- Scores < 70: Needs focused drill on that specific sound
- "Omission" errors mean the word wasn't detected — speaker may have been too quiet or mumbled
- Prosody score < 85 suggests monotone delivery — coach on intonation rises/falls
- Compare scores across multiple recordings to track improvement
发音教练
使用 Azure 语音服务分析英语口语发音,并提供可操作的辅导反馈。
隐私说明:此技能从 ~/.openclaw/media/inbound/ 读取本地语音消息,并将其传输至 Microsoft Azure 语音服务进行处理。
前置条件
- - Azure 语音 API 密钥:设置 AZURESPEECHKEY 环境变量
- Azure 语音区域:设置 AZURESPEECHREGION 环境变量(例如 southeastasia)
- ffmpeg:音频格式转换所需(必须在 PATH 中)
- Node.js:报告生成所需
工作流程
1. 接收音频
来自 Telegram 的语音消息存储在 ~/.openclaw/media/inbound/ 中。查找与消息时间戳匹配的最新 .ogg 文件。
bash
ls -lt ~/.openclaw/media/inbound/*.ogg | head -5
2. 运行评估
bash
scripts/pronunciation-assess.sh <音频文件> <参考文本>
- - 音频文件:语音消息的路径(ogg/wav/mp3/m4a)
- 参考文本:说话者意图表达的内容(来自转录文本)
- 脚本会自动将任何格式转换为 16kHz 单声道 WAV
3. 生成报告
将 JSON 输出通过管道传递给报告生成器:
bash
scripts/pronunciation-assess.sh audio.ogg 参考文本 | node scripts/pronunciation-report.js
报告包含:
- - 总体评分(发音、准确度、流利度、韵律、完整性)
- 逐词分解及每个音素的评分
- 问题发音高亮显示
- 结论及可操作的下步建议
4. 提供辅导
生成报告后:
- 1. 向用户发送文本报告(评分 + 单词分解)
- 从音素评分中识别出前 3 个问题发音
- 解释每个问题——正确发音是什么以及如何发出该音
- 音素描述和修正方法请参阅 references/phoneme-guide.md
- 4. 发送语音消息(通过 TTS)演示问题单词的正确发音
- 布置练习——给用户指定具体句子,要求其重新录制,重点练习薄弱发音
辅导技巧
- - 评分 ≥ 90:优秀,只需微调
- 评分 70-89:良好,需要针对性练习
- 评分 < 70:需要对该特定发音进行集中训练
- 遗漏错误表示未检测到该单词——说话者可能声音太小或含糊不清
- 韵律评分 < 85 表明语调平淡——辅导时注意语调的升降变化
- 比较多次录音的评分以追踪进步情况