Douyin Transcribe - Video Transcription Suite
A complete solution for transcribing Douyin (抖音/TikTok China) videos. Extracts audio, transcribes speech to text, and generates structured summaries.
Version History
| Version | Changes |
|---|
| 2.0.0 | Modular architecture, improved workflow, browser DOM extraction |
| 1.0.0 |
Initial release, basic transcription |
Architecture
\\\
User Input (Douyin Link/File)
│
▼
┌─────────────────────────────────────────┐
│ Workflow Orchestrator │
├─────────────────────────────────────────┤
│ Step 1: Fetcher → Get video file │
│ Step 2: Transcriber → Extract & convert│
│ Step 3: Analyzer → Structure output │
│ Step 4: Output → Save results │
└─────────────────────────────────────────┘
\\\
Core Features
- - Video Fetching: Browser-based DOM extraction for CDN URLs
- Audio Extraction: ffmpeg-powered audio conversion
- Speech-to-Text: Whisper ASR with multiple model options
- Content Analysis: Auto-structured transcripts with key points
- Multi-format Support: Video links, local files, image notes
Prerequisites
| Tool | Purpose | Install |
|---|
| curl | Download files | Built-in (Windows: \curl.exe\) |
| ffmpeg |
Audio extraction/merge | \winget install Gyan.FFmpeg\ |
| Whisper | Transcription | \pip install openai-whisper\ or Docker |
| Browser | Video extraction | OpenClaw profile required |
Docker Whisper (Recommended):
\\\ash
docker run -d -p 9000:9000 --name whisper-asr onerahmet/openai-whisper-asr-webservice:latest
\\\
Workflow
Step 0: Input Classification
| Input Type | Detection | Action |
|---|
| Video link (\/video/\) | URL pattern | Full workflow |
| Image note (\/note/\) |
URL pattern | Snapshot only |
| Local video file | File path | Start from Step 2 |
| Text input | Plain text | Start from Step 3 |
Step 1: Fetch Video
1.1 Resolve Short URL
\\\ash
Windows PowerShell
curl.exe -sL -o NUL -w "%{url_effective}" "https://v.douyin.com/xxx/"
macOS/Linux
curl -sL -o /dev/null -w '%{url_effective}' "https://v.douyin.com/xxx/"
\\\
Output: \https://www.douyin.com/video/7616020798351871284\
1.2 Open Video Page
\\\
browser(action='open', profile='openclaw', url='https://www.douyin.com/video/{VIDEO_ID}')
\\\
Wait 10-15 seconds for page to load completely.
1.3 Extract Video URL (Browser DOM Method)
\\\javascript
browser(action='act', targetId='PAGE_ID', request={
"kind": "evaluate",
"fn": "(() => {
const entries = performance.getEntriesByType('resource');
const videoEntries = entries.filter(e => {
const name = e.name.toLowerCase();
return name.includes('douyinvod') &&
(name.includes('.mp4') || name.includes('video'));
});
if (videoEntries.length > 0) {
const video = videoEntries[videoEntries.length - 1];
return {
url: video.name,
type: video.name.includes('.mp4') ? 'mp4' : 'dash'
};
}
return null;
})()"
})
\\\
Important Notes:
- - \ct\ action requires nested \
equest\ object with \kind\ and \n\
- - Wrong: \rowser(action='act', fn='...')\
- Correct: \rowser(action='act', request={"kind": "evaluate", "fn": "..."})\
1.4 Download Video
\\\ash
curl.exe -L -H "Referer: https://www.douyin.com/" -o video.mp4 ""
\\\
Referer header is required, otherwise 403.
Step 2: Transcribe Audio
2.1 Extract Audio
\\\ash
For MP4 videos
ffmpeg -i video.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y
For DASH videos (need merge)
ffmpeg -i video.mp4 -i audio.mp4 -c copy merged.mp4 -y
ffmpeg -i merged.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y
\\\
Parameters:
- - \-ar 16000\: 16kHz sample rate (Whisper requirement)
- \-ac 1\: Mono channel
- \-c:a pcm_s16le\: 16-bit PCM
2.2 Transcribe with Docker Whisper
\\\ash
curl.exe -X POST "http://localhost:PORT/asr" -F "audio_file=@audio.wav"
\\\
2.3 Alternative: Local Whisper
\\\ash
python -m whisper audio.wav --model small --language zh
\\\
Model Selection:
| Model | Size | 5-min Video (CPU) | Accuracy | Use Case |
|---|
| tiny | 75MB | ~30s | Fair | Quick preview |
| base |
142MB | ~1min | Good | Daily use |
| small | 466MB | ~3min | Better |
Recommended |
| medium | 1.5GB | ~8min | Best | High accuracy |
Step 3: Analyze Content
Agent processes transcript and generates:
- 1. Fix transcription errors
- Correct homophones
- Fix speaker names
- Remove filler words
- 2. Structure content
- Add paragraph breaks
- Create sections
- 3. Extract key points
- Main ideas
- Important quotes
- 4. Generate tags
- 3-5 topic tags
Step 4: Save Output
Transcript Format
\\\markdown
{Title}
作者: {Author}
来源: 抖音
日期: {Date}
转录时间: {Transcription Date}
摘要
{Summary}
正文
{Transcript content with paragraphs}
要点
- - {Key point 1}
- {Key point 2}
- {Key point 3}
标签
#{tag1} #{tag2} #{tag3}
\\\
File Naming Convention
\\\
{VIDEO_ID}-抖音转录.md
\\\
Troubleshooting
| Stage | Issue | Solution |
|---|
| Step 1 | Short URL fails | Check link completeness, remove share text |
| Step 1 |
JS returns null | Wait 15-20s and retry, increase timeout |
| Step 1 | Download 403 | URL expired, re-fetch from browser |
| Step 1 | DASH no audio | Merge with \fmpeg -i video -i audio -c copy\ |
| Step 2 | ffmpeg not installed | \winget install Gyan.FFmpeg\ |
| Step 2 | Whisper service down | \docker start whisper-asr\ |
| Step 2 | Transcription slow | 10-min video takes 15-20 min on CPU |
| Step 2 | Poor quality | Use larger model (medium) |
Image Note Handling
Image notes (\/note/\) don't need transcription:
\\\
- 1. browser(action='open', profile='openclaw', url='IMAGENOTEURL')
- browser(action='snapshot')
- Extract content from snapshot
- Save to output directory
\\\
Edge Cases
- - Article links (\/article/\): Use browser snapshot, no transcription
- Douyin AI summary: Extract from page as supplement
- Other platforms: Use yt-dlp for YouTube/Bilibili
- Live streams: Not supported
Related Modules
This skill can be extended with standalone modules:
| Module | Purpose |
|---|
| douyin-fetcher | Video fetching only |
| douyin-transcriber |
Audio transcription only |
| douyin-analyzer | Content analysis only |
| douyin-orchestrator | Workflow coordination |
License
MIT-0 License - Free to use, modify, and redistribute.
抖音转录 - 视频转录套件
用于转录抖音视频的完整解决方案。提取音频、将语音转为文本,并生成结构化摘要。
版本历史
| 版本 | 变更内容 |
|---|
| 2.0.0 | 模块化架构、改进工作流程、浏览器DOM提取 |
| 1.0.0 |
初始版本,基础转录功能 |
架构
\\\
用户输入(抖音链接/文件)
│
▼
┌─────────────────────────────────────────┐
│ 工作流程编排器 │
├─────────────────────────────────────────┤
│ 步骤1:获取器 → 获取视频文件 │
│ 步骤2:转录器 → 提取并转换 │
│ 步骤3:分析器 → 结构化输出 │
│ 步骤4:输出器 → 保存结果 │
└─────────────────────────────────────────┘
\\\
核心功能
- - 视频获取:基于浏览器的DOM提取,获取CDN链接
- 音频提取:基于ffmpeg的音频转换
- 语音转文字:Whisper ASR,支持多种模型选择
- 内容分析:自动结构化转录文本,提取关键点
- 多格式支持:视频链接、本地文件、图文笔记
前置条件
| 工具 | 用途 | 安装方式 |
|---|
| curl | 下载文件 | 内置(Windows:\curl.exe\) |
| ffmpeg |
音频提取/合并 | \winget install Gyan.FFmpeg\ |
| Whisper | 转录 | \pip install openai-whisper\ 或 Docker |
| 浏览器 | 视频提取 | 需要OpenClaw配置文件 |
Docker Whisper(推荐):
\\\ash
docker run -d -p 9000:9000 --name whisper-asr onerahmet/openai-whisper-asr-webservice:latest
\\\
工作流程
步骤0:输入分类
| 输入类型 | 检测方式 | 操作 |
|---|
| 视频链接(\/video/\) | URL模式 | 完整工作流程 |
| 图文笔记(\/note/\) |
URL模式 | 仅截图 |
| 本地视频文件 | 文件路径 | 从步骤2开始 |
| 文本输入 | 纯文本 | 从步骤3开始 |
步骤1:获取视频
1.1 解析短链接
\\\ash
Windows PowerShell
curl.exe -sL -o NUL -w %{url_effective} https://v.douyin.com/xxx/
macOS/Linux
curl -sL -o /dev/null -w %{url_effective} https://v.douyin.com/xxx/
\\\
输出:\https://www.douyin.com/video/7616020798351871284\
1.2 打开视频页面
\\\
browser(action=open, profile=openclaw, url=https://www.douyin.com/video/{VIDEO_ID})
\\\
等待10-15秒,确保页面完全加载。
1.3 提取视频链接(浏览器DOM方法)
\\\javascript
browser(action=act, targetId=PAGE_ID, request={
kind: evaluate,
fn: (() => {
const entries = performance.getEntriesByType(resource);
const videoEntries = entries.filter(e => {
const name = e.name.toLowerCase();
return name.includes(douyinvod) &&
(name.includes(.mp4) || name.includes(video));
});
if (videoEntries.length > 0) {
const video = videoEntries[videoEntries.length - 1];
return {
url: video.name,
type: video.name.includes(.mp4) ? mp4 : dash
};
}
return null;
})()
})
\\\
重要提示:
equest\ 对象,包含 \kind\ 和 \n\
- - 错误写法:\rowser(action=act, fn=...)\
- 正确写法:\rowser(action=act, request={kind: evaluate, fn: ...})\
1.4 下载视频
\\\ash
curl.exe -L -H Referer: https://www.douyin.com/ -o video.mp4
\\\
必须添加Referer头,否则返回403错误。
步骤2:转录音频
2.1 提取音频
\\\ash
对于MP4视频
ffmpeg -i video.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y
对于DASH视频(需要合并)
ffmpeg -i video.mp4 -i audio.mp4 -c copy merged.mp4 -y
ffmpeg -i merged.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y
\\\
参数说明:
- - \-ar 16000\:16kHz采样率(Whisper要求)
- \-ac 1\:单声道
- \-c:a pcm_s16le\:16位PCM编码
2.2 使用Docker Whisper转录
\\\ash
curl.exe -X POST http://localhost:PORT/asr -F audio_file=@audio.wav
\\\
2.3 备选方案:本地Whisper
\\\ash
python -m whisper audio.wav --model small --language zh
\\\
模型选择:
| 模型 | 大小 | 5分钟视频(CPU) | 准确度 | 使用场景 |
|---|
| tiny | 75MB | 约30秒 | 一般 | 快速预览 |
| base |
142MB | 约1分钟 | 良好 | 日常使用 |
| small | 466MB | 约3分钟 | 较好 |
推荐 |
| medium | 1.5GB | 约8分钟 | 最佳 | 高精度需求 |
步骤3:分析内容
智能体处理转录文本并生成:
- 1. 修正转录错误
- 纠正同音字
- 修正说话人名称
- 删除填充词
- 2. 结构化内容
- 添加段落分隔
- 创建章节
- 3. 提取关键点
- 主要观点
- 重要引用
- 4. 生成标签
- 3-5个主题标签
步骤4:保存输出
转录文本格式
\\\markdown
{标题}
作者: {作者}
来源: 抖音
日期: {日期}
转录时间: {转录日期}
摘要
{摘要内容}
正文
{带段落的转录内容}
要点
标签
#{标签1} #{标签2} #{标签3}
\\\
文件命名规则
\\\
{VIDEO_ID}-抖音转录.md
\\\
故障排除
| 阶段 | 问题 | 解决方案 |
|---|
| 步骤1 | 短链接失败 | 检查链接完整性,删除分享文本 |
| 步骤1 |
JS返回null | 等待15-20秒后重试,增加超时时间 |
| 步骤1 | 下载返回403 | 链接已过期,重新从浏览器获取 |
| 步骤1 | DASH无音频 | 使用 \fmpeg -i video -i audio -c copy\ 合并 |
| 步骤2 | ffmpeg未安装 | \winget install Gyan.FFmpeg\ |
| 步骤2 | Whisper服务停止 | \docker start whisper-asr\ |
| 步骤2 | 转录速度慢 | 10分钟视频在CPU上需要15-20分钟 |
| 步骤2 | 质量不佳 | 使用更大的模型(medium) |
图文笔记处理
图文笔记(\/note/\) 无需转录:
\\\
- 1. browser(action=open, profile=openclaw, url=图文笔记链接)
- browser(action=snapshot)
- 从截图中提取内容
- 保存到输出目录
\\\
边界情况
- - 文章链接(\/article/\):使用浏览器截图,无需转录
- 抖音AI摘要:从页面提取作为补充
- 其他平台:使用yt-dlp处理YouTube/Bilibili
- 直播流:不支持
相关模块
本技能可通过独立模块进行扩展:
| 模块 | 用途 |
|---|
| douyin-fetcher | 仅视频获取 |
| douyin-transcriber |
仅音频转录 |
| douyin-analyzer | 仅内容分析 |