Subtitle Extractor Skill
Extracts subtitles from video platforms in their native format. Supports Bilibili, YouTube, Xiaohongshu, Douyin, and local video files.
Scope of this skill: subtitle extraction only. Summarization, analysis, Q&A — all handled by the agent based on the user's actual request.
What It Does
- 1. Detect platform from URL
- Extract native subtitles via yt-dlp (or Whisper transcription when no native subtitles exist)
- Output: raw subtitle file path + video title/author
- Agent saves subtitle to
outputs/ and processes per user request
Dependencies
Agent must verify dependencies before calling the script. If any are missing, inform the user with the relevant install command.
yt-dlp — Required (always)
CODEBLOCK0
ffmpeg — Required only for Whisper transcription
Only needed for Xiaohongshu, Douyin, local files, or Path B (Whisper transcription).
CODEBLOCK1
Windows users: restart the terminal after installation for PATH to take effect.
If winget is unavailable, download from ffmpeg.org and add the bin/ directory to system PATH.
faster-whisper — Required only for transcription platforms
Only needed for Xiaohongshu, Douyin, local files, or Path B (Whisper transcription).
CODEBLOCK2
Note: Model files are downloaded automatically on first transcription run (~150MB for base). This may take a minute depending on network speed.
China network note: If auto-download fails (HuggingFace blocked), see Whisper model download failed in Troubleshooting.
Transcription time estimate (CPU, faster-whisper):
| Video Duration | tiny | base | small | medium |
|---|
| 5 min | ~10s | ~20s | ~40s | ~80s |
| 15 min |
~30s | ~60s | ~2m | ~4m |
| 30 min | ~60s | ~2m | ~5m | ~10m |
GPU accelerates transcription 5–15×. First run downloads the model (~150MB for base).
Cookie Configuration
Bilibili — Cookie Required
Bilibili requires a cookie file for all requests. The script auto-discovers cookie files in the skill directory only (same folder as subtitle-extractor.py and SKILL.md):
Any .txt file whose name contains bilibili will be picked up automatically — including the browser extension's default export format www.bilibili.com_netscape_<timestamp>.txt.
Place your cookie file in the skill directory. The agent does not need to locate or pass it manually (see Step 1b).
Xiaohongshu / Douyin — Manual
CODEBLOCK3
How to export cookies:
- 1. Install browser extension: "Cookie Editor(https://cookieeditor.org/)"
- Log in to the platform
- Export cookies to a
.txt file(Netscape format)
Agent Workflow
EXECUTION ORDER — NON-NEGOTIABLE
Steps 1–4 in this skill MUST be completed in full before addressing any user request. The subtitle file MUST be saved to disk (Step 4) before the agent proceeds to summarization, translation, analysis, or any other task the user has asked for.
Treat Steps 1–4 as mandatory prerequisites, not optional helpers. Do not skip any step even if the user's final output format (e.g. a markdown file) appears to make it unnecessary.
Step 1 — Check Dependencies
CODEBLOCK4
If the user requests Whisper transcription (keywords: "whisper转录" / "用whisper" / "transcribe" / "转录" / "语音转文字"), or the platform is Xiaohongshu, Douyin, or a local file, also check:
CODEBLOCK5
If anything is missing, stop and tell the user which dependency to install (see Dependencies section).
Step 1b — Bilibili Cookie
The script auto-discovers any .txt file containing "bilibili" in the skill directory. Do not search for or pass the cookie file yourself.
Only act if the script exits with:
- -
未找到 Bilibili Cookie 文件 → tell the user to place a cookie file in the skill directory - INLINECODE11 → tell the user to re-export
To export: install "Cookie Editor (https://cookieeditor.org/)", log in to Bilibili, export Netscape format → place in skill directory → retry.
Step 2 — Extract Subtitles
Determine which path applies, then execute it completely before moving to Step 3.
Path A — Native subtitles
Use when: Bilibili or YouTube URL, and the user has not mentioned any transcription keyword.
Tell the user: "正在提取字幕..."
CODEBLOCK6
Parse the JSON from stdout. You now have all four fields needed for Step 3:
| Field | Value |
|---|
| INLINECODE12 | from this JSON |
| INLINECODE13 |
from this JSON |
|
platform | from this JSON |
|
subtitle_file | from this JSON |
If the script exits non-zero: read stderr, report the error to the user, stop.
Path B — Whisper transcription
Use when: user mentions any transcription keyword, OR platform is Xiaohongshu or Douyin.
Transcription keyword takes priority over phrasing like "提取字幕" or "字幕原文" — those describe the desired output, not the method.
Call 1 — Download audio (skip for local files, go to Call 2 directly)
Tell the user: "正在下载音频,请稍候..."
CODEBLOCK7
Parse the JSON from stdout and record these values:
| Field | Value |
|---|
| INLINECODE16 | from this JSON |
| INLINECODE17 |
from this JSON |
|
platform | from this JSON |
|
audio_file | from this JSON — input for Call 2 |
If the script exits non-zero: read stderr, report the error to the user, stop.
Tell the user: "音频下载完成,开始 Whisper 转录(模型: base),请稍候..."
Call 2 — Transcribe
For URL input, use the audio_file recorded from Call 1:
CODEBLOCK8
For local file input (set title = filename, author = "local"):
CODEBLOCK9
Parse the JSON from stdout and record:
| Field | Value |
|---|
| INLINECODE23 | from this JSON |
Tell the user: "转录完成!"
If the script exits non-zero:
- - Read stderr, report the error to the user, stop
- If stderr contains
Whisper 模型下载失败: show the full error message verbatim — it contains the exact download directory and manual steps
Failure rule: Do not run yt-dlp, ffmpeg, or Whisper commands manually. Do not retry with different flags unless the error message explicitly says to.
Step 3 — Confirm Data
Verify you have collected all four values from the script outputs in Step 2:
| Field | Path A source | Path B source |
|---|
| INLINECODE25 | script JSON | Call 1 JSON (or filename for local) |
| INLINECODE26 |
script JSON | Call 1 JSON (or "local") |
|
subtitle_file | script JSON | Call 2 JSON |
Note: non-ASCII characters in JSON output appear as \uXXXX escapes — standard JSON parsing produces the correct decoded strings.
Step 4 — Save Subtitle to Outputs (REQUIRED — DO NOT SKIP)
Before answering the user, save the subtitle file to the session outputs directory.
Naming rule: {title前8字}_{author}.{原格式扩展名}
Steps:
- 1. Take
title, keep the first 8 characters (Chinese and English each count as 1) - Replace unsafe filesystem characters
/ \ : * ? " < > | and spaces with INLINECODE32 - Apply the same sanitization to INLINECODE33
- Use the extension from
subtitle_file path (.srt or .vtt) - Save to INLINECODE37
Step 5 — Process and Respond
Read the subtitle file content and respond to the user's original request — summarize, analyze, translate, answer questions, etc. The subtitle content is in SRT or VTT format with timestamps; LLMs handle both directly.
Platform Notes
| Platform | Method | Notes |
|---|
| YouTube | yt-dlp native CC + auto-generated | Best support, usually no cookies needed |
| Bilibili |
yt-dlp native CC | Auto-discovers cookies; zh-CN → ai-zh fallback; 412 error handling |
|
Xiaohongshu | Whisper transcription | No native subtitles; requires ffmpeg + whisper |
|
Douyin | Whisper transcription | No native subtitles; requires ffmpeg + whisper |
|
Local files | Whisper transcription | mp4, mkv, webm, mp3, etc. |
Supported URL Formats
YouTube: youtube.com/watch?v=... · INLINECODE39
Bilibili: bilibili.com/video/BV... · bilibili.com/video/av... · b23.tv/... (short link)
Xiaohongshu: xiaohongshu.com/explore/... · xhslink.com/... (short link)
Douyin: douyin.com/video/... · v.douyin.com/... (short link)
Script Reference
CODEBLOCK10
Troubleshooting
"yt-dlp: command not found"
CODEBLOCK11
"No subtitles found"
- - The video may not have CC subtitles — use Path B (
--step download-audio then --step transcribe) to force Whisper - For Xiaohongshu/Douyin, transcription is always required (no native subtitles)
- Try
--lang to specify a different language code
Bilibili 412 Precondition Failed
Cookie expired. Re-export:
- 1. Log in to Bilibili in browser
- Use "Cookie Editor(https://cookieeditor.org/)" extension
- Export(Netscape format)→ place in skill directory → retry
Bilibili: no zh-CN subtitle found
The script automatically falls back to
ai-zh. If both fail, it lists all available subtitle codes. Use
--lang <code> to specify one.
"Whisper not installed"
CODEBLOCK12
Whisper model download failed
The script tries hf-mirror.com then huggingface.co. If both fail (common in China), the script will print exact steps. Show the error message to the user verbatim — it contains the exact directory path and download URL.
Manual download (browser accessible in China):
- 1. Open: INLINECODE54
- Download these 5 files:
config.json model.bin tokenizer.json vocabulary.json INLINECODE59 - Create the directory shown in the error message and place all 5 files there
- Re-run the script — it auto-detects the local model, no download needed
For other model sizes (tiny/small/medium/large), change faster-whisper-base to faster-whisper-{size} in the ModelScope URL.
"ffmpeg not found" (during transcription)
See ffmpeg install commands in the Dependencies section above.
Video too long for Whisper
Use a smaller model:
export VIDEO_SUMMARY_WHISPER_MODEL=tiny
Extract subtitles. Let the agent think.
字幕提取技能
从视频平台提取原生格式的字幕。支持Bilibili、YouTube、小红书、抖音以及本地视频文件。
本技能范围: 仅限字幕提取。摘要、分析、问答——均由智能体根据用户实际请求处理。
功能说明
- 1. 从URL检测平台类型
- 通过yt-dlp提取原生字幕(若无原生字幕则使用Whisper转录)
- 输出:原始字幕文件路径 + 视频标题/作者
- 智能体将字幕保存至outputs/目录,并按用户请求进行处理
依赖项
调用脚本前,智能体必须验证依赖项。如有缺失,需告知用户并提供相应的安装命令。
yt-dlp — 必需(始终需要)
bash
检查
yt-dlp --version
安装
pip install yt-dlp # 所有平台(推荐)
brew install yt-dlp # macOS Homebrew
winget install yt-dlp.yt-dlp # Windows WinGet
scoop install yt-dlp # Windows Scoop
conda install -c conda-forge yt-dlp # Conda环境
升级现有安装
pip install -U yt-dlp
ffmpeg — 仅Whisper转录时需要
仅适用于小红书、抖音、本地文件或路径B(Whisper转录)。
bash
检查
ffmpeg -version
安装
brew install ffmpeg # macOS Homebrew
winget install Gyan.FFmpeg # Windows WinGet
choco install ffmpeg # Windows Chocolatey
scoop install ffmpeg # Windows Scoop
apt install ffmpeg # Ubuntu / Debian
dnf install ffmpeg # Fedora / RHEL(可能需要RPM Fusion)
pacman -S ffmpeg # Arch Linux
snap install ffmpeg # Ubuntu Snap
Windows用户: 安装后需重启终端以使PATH生效。
若winget不可用,请从ffmpeg.org下载并将bin/目录添加到系统PATH中。
faster-whisper — 仅转录平台时需要
仅适用于小红书、抖音、本地文件或路径B(Whisper转录)。
bash
检查
python3 -c from faster_whisper import WhisperModel; print(ok)
安装
pip install faster-whisper
配置模型大小(默认:base)
export VIDEO
SUMMARYWHISPER_MODEL=base # tiny | base | small | medium | large
注意: 首次转录运行时模型文件会自动下载(base模型约150MB)。根据网络速度可能需要一分钟。
中国网络提示: 若自动下载失败(HuggingFace被屏蔽),请参阅故障排除中的Whisper模型下载失败。
转录时间估算(CPU,faster-whisper):
| 视频时长 | tiny | base | small | medium |
|---|
| 5分钟 | ~10秒 | ~20秒 | ~40秒 | ~80秒 |
| 15分钟 |
~30秒 | ~60秒 | ~2分钟 | ~4分钟 |
| 30分钟 | ~60秒 | ~2分钟 | ~5分钟 | ~10分钟 |
GPU可将转录速度提升5-15倍。首次运行需下载模型(base约150MB)。
Cookie配置
Bilibili — 需要Cookie
Bilibili的所有请求都需要cookie文件。脚本仅在技能目录中自动发现cookie文件(与subtitle-extractor.py和SKILL.md同目录):
任何文件名包含bilibili的.txt文件都会被自动识别——包括浏览器扩展默认导出的www.bilibili.comnetscape.txt格式。
请将cookie文件放置在技能目录中。 智能体无需手动定位或传递cookie文件(参见步骤1b)。
小红书 / 抖音 — 手动配置
bash
video-summary https://www.xiaohongshu.com/explore/xxxxx --cookies cookies.txt
或
export VIDEO
SUMMARYCOOKIES=/path/to/cookies.txt
如何导出cookie:
- 1. 安装浏览器扩展:Cookie Editor(https://cookieeditor.org/)
- 登录平台
- 将cookie导出为.txt文件(Netscape格式)
智能体工作流程
执行顺序 — 不可更改
在处理任何用户请求之前,必须完整完成本技能中的步骤1-4。在智能体进行摘要、翻译、分析或用户要求的任何其他任务之前,必须先将字幕文件保存到磁盘(步骤4)。
将步骤1-4视为强制性先决条件,而非可选的辅助步骤。即使用户的最终输出格式(如markdown文件)看似不需要,也不得跳过任何步骤。
步骤1 — 检查依赖项
bash
yt-dlp --version
如果用户请求Whisper转录(关键词:whisper转录 / 用whisper / transcribe / 转录 / 语音转文字),或者平台为小红书、抖音或本地文件,还需检查:
bash
ffmpeg -version
python3 -c from faster_whisper import WhisperModel; print(ok)
如有缺失,停止并告知用户需要安装的依赖项(参见依赖项部分)。
步骤1b — Bilibili Cookie
脚本会自动发现技能目录中包含bilibili的任何.txt文件。请勿自行搜索或传递cookie文件。
仅在脚本退出并显示以下信息时采取行动:
- - 未找到 Bilibili Cookie 文件 → 告知用户将cookie文件放置在技能目录中
- Bilibili 412 错误:Cookie 已过期 → 告知用户重新导出
导出方法:安装Cookie Editor (https://cookieeditor.org/),登录Bilibili,导出Netscape格式 → 放入技能目录 → 重试。
步骤2 — 提取字幕
确定适用的路径,完全执行后再进入步骤3。
路径A — 原生字幕
适用场景:Bilibili或YouTube链接,且用户未提及任何转录关键词。
告知用户:正在提取字幕...
bash
python subtitle-extractor.py # 自动检测语言
python subtitle-extractor.py --lang zh-CN # 强制指定语言
解析stdout中的JSON。现在已获得步骤3所需的全部四个字段:
来自此JSON |
| platform | 来自此JSON |
| subtitle_file | 来自此JSON |
若脚本非零退出:读取stderr,向用户报告错误,停止。
路径B — Whisper转录
适用场景:用户提及任何转录关键词,或平台为小红书/抖音。
转录关键词优先于提取字幕或字幕原文等表述——后者描述的是期望的输出内容,而非方法。
调用1 — 下载音频(本地文件跳过,直接进入调用2)
告知用户:正在下载音频,请稍候...
bash
python subtitle-extractor.py --step download-audio
解析stdout中的JSON并记录以下值:
来自此JSON |
| platform | 来自此JSON |
| audio_file | 来自此JSON — 调用2的输入 |
若脚本非零退出:读取stderr,向用户报告错误,停止。
告知用户:音频下载完成,开始 Whisper 转录(模型: base),请稍候...
调用2 — 转录
对于URL输入,使用调用1记录的audio_file:
bash
python subtitle-extractor.py --step transcribe
对于本地文件输入(设置title = 文件名,author = local):
bash
python subtitle-extractor.py filepath> --step transcribe
解析stdout中的JSON并记录:
告知用户:转录完成!
若脚本非零退出:
- - 读取stderr,向用户报告错误,停止
- 若stderr包含Whisper 模型下载失败:逐字显示完整错误信息——其中包含确切的下载目录和手动操作步骤
失败规则: 请勿手动运行yt-dlp、ffmpeg或Whisper命令。除非错误信息明确指示,否则不要使用不同参数重试。
步骤3 — 确认数据
验证已从步骤2的脚本输出中收集到全部四个值:
| 字段 | 路径A来源 | 路径B来源 |
|------|-----------