Subtitle Extractor Skill

Extracts subtitles from video platforms in their native format. Supports Bilibili, YouTube, Xiaohongshu, Douyin, and local video files.

Scope of this skill: subtitle extraction only. Summarization, analysis, Q&A — all handled by the agent based on the user's actual request.

What It Does

1. Detect platform from URL
Extract native subtitles via yt-dlp (or Whisper transcription when no native subtitles exist)
Output: raw subtitle file path + video title/author
Agent saves subtitle to outputs/ and processes per user request

Dependencies

Agent must verify dependencies before calling the script. If any are missing, inform the user with the relevant install command.

yt-dlp — Required (always)

CODEBLOCK0

ffmpeg — Required only for Whisper transcription

Only needed for Xiaohongshu, Douyin, local files, or Path B (Whisper transcription).

CODEBLOCK1

Windows users: restart the terminal after installation for PATH to take effect.
If winget is unavailable, download from ffmpeg.org and add the bin/ directory to system PATH.

faster-whisper — Required only for transcription platforms

Only needed for Xiaohongshu, Douyin, local files, or Path B (Whisper transcription).

CODEBLOCK2

Note: Model files are downloaded automatically on first transcription run (~150MB for base). This may take a minute depending on network speed.

China network note: If auto-download fails (HuggingFace blocked), see Whisper model download failed in Troubleshooting.

Transcription time estimate (CPU, faster-whisper):

Video Duration	tiny	base	small	medium
5 min	~10s	~20s	~40s	~80s
15 min

~30s | ~60s | ~2m | ~4m |
| 30 min | ~60s | ~2m | ~5m | ~10m |

GPU accelerates transcription 5–15×. First run downloads the model (~150MB for base).

Cookie Configuration

Bilibili — Cookie Required

Bilibili requires a cookie file for all requests. The script auto-discovers cookie files in the skill directory only (same folder as subtitle-extractor.py and SKILL.md):

Any .txt file whose name contains bilibili will be picked up automatically — including the browser extension's default export format www.bilibili.com_netscape_<timestamp>.txt.

Place your cookie file in the skill directory. The agent does not need to locate or pass it manually (see Step 1b).

Xiaohongshu / Douyin — Manual

CODEBLOCK3

How to export cookies:

1. Install browser extension: "Cookie Editor（https://cookieeditor.org/）"
Log in to the platform
Export cookies to a .txt file（Netscape format）

Agent Workflow

EXECUTION ORDER — NON-NEGOTIABLE

Steps 1–4 in this skill MUST be completed in full before addressing any user request. The subtitle file MUST be saved to disk (Step 4) before the agent proceeds to summarization, translation, analysis, or any other task the user has asked for.

Treat Steps 1–4 as mandatory prerequisites, not optional helpers. Do not skip any step even if the user's final output format (e.g. a markdown file) appears to make it unnecessary.

Step 1 — Check Dependencies

CODEBLOCK4

If the user requests Whisper transcription (keywords: "whisper转录" / "用whisper" / "transcribe" / "转录" / "语音转文字"), or the platform is Xiaohongshu, Douyin, or a local file, also check:

CODEBLOCK5

If anything is missing, stop and tell the user which dependency to install (see Dependencies section).

Step 1b — Bilibili Cookie

The script auto-discovers any .txt file containing "bilibili" in the skill directory. Do not search for or pass the cookie file yourself.

Only act if the script exits with:

- 未找到 Bilibili Cookie 文件 → tell the user to place a cookie file in the skill directory
INLINECODE11 → tell the user to re-export

To export: install "Cookie Editor (https://cookieeditor.org/)", log in to Bilibili, export Netscape format → place in skill directory → retry.

Step 2 — Extract Subtitles

Determine which path applies, then execute it completely before moving to Step 3.

Path A — Native subtitles

Use when: Bilibili or YouTube URL, and the user has not mentioned any transcription keyword.

Tell the user: "正在提取字幕..."

CODEBLOCK6

Parse the JSON from stdout. You now have all four fields needed for Step 3:

Field	Value
INLINECODE12	from this JSON
INLINECODE13

If the script exits non-zero: read stderr, report the error to the user, stop.

Path B — Whisper transcription

Use when: user mentions any transcription keyword, OR platform is Xiaohongshu or Douyin.

Transcription keyword takes priority over phrasing like "提取字幕" or "字幕原文" — those describe the desired output, not the method.

Call 1 — Download audio (skip for local files, go to Call 2 directly)

Tell the user: "正在下载音频，请稍候..."

CODEBLOCK7

Parse the JSON from stdout and record these values:

Field	Value
INLINECODE16	from this JSON
INLINECODE17

If the script exits non-zero: read stderr, report the error to the user, stop.

Tell the user: "音频下载完成，开始 Whisper 转录（模型: base），请稍候..."

Call 2 — Transcribe

For URL input, use the audio_file recorded from Call 1:
CODEBLOCK8

For local file input (set title = filename, author = "local"):
CODEBLOCK9

Parse the JSON from stdout and record:

Field	Value
INLINECODE23	from this JSON

Tell the user: "转录完成！"

If the script exits non-zero:

- Read stderr, report the error to the user, stop
If stderr contains Whisper 模型下载失败: show the full error message verbatim — it contains the exact download directory and manual steps

Failure rule: Do not run yt-dlp, ffmpeg, or Whisper commands manually. Do not retry with different flags unless the error message explicitly says to.

Step 3 — Confirm Data

Verify you have collected all four values from the script outputs in Step 2:

Field	Path A source	Path B source
INLINECODE25	script JSON	Call 1 JSON (or filename for local)
INLINECODE26

Note: non-ASCII characters in JSON output appear as \uXXXX escapes — standard JSON parsing produces the correct decoded strings.

Step 4 — Save Subtitle to Outputs (REQUIRED — DO NOT SKIP)

Before answering the user, save the subtitle file to the session outputs directory.

Naming rule: {title前8字}_{author}.{原格式扩展名}

Steps:

1. Take title, keep the first 8 characters (Chinese and English each count as 1)
Replace unsafe filesystem characters / \ : * ? " < > | and spaces with INLINECODE32
Apply the same sanitization to INLINECODE33
Use the extension from subtitle_file path (.srt or .vtt)
Save to INLINECODE37

Step 5 — Process and Respond

Read the subtitle file content and respond to the user's original request — summarize, analyze, translate, answer questions, etc. The subtitle content is in SRT or VTT format with timestamps; LLMs handle both directly.

Platform Notes

Platform	Method	Notes
YouTube	yt-dlp native CC + auto-generated	Best support, usually no cookies needed
Bilibili

Supported URL Formats

YouTube: youtube.com/watch?v=... · INLINECODE39

Bilibili: bilibili.com/video/BV... · bilibili.com/video/av... · b23.tv/... (short link)

Xiaohongshu: xiaohongshu.com/explore/... · xhslink.com/... (short link)

Douyin: douyin.com/video/... · v.douyin.com/... (short link)

Script Reference

CODEBLOCK10

Troubleshooting

"yt-dlp: command not found"

CODEBLOCK11

"No subtitles found"

- The video may not have CC subtitles — use Path B (--step download-audio then --step transcribe) to force Whisper
For Xiaohongshu/Douyin, transcription is always required (no native subtitles)
Try --lang to specify a different language code

Bilibili 412 Precondition Failed

Cookie expired. Re-export:

1. Log in to Bilibili in browser
Use "Cookie Editor（https://cookieeditor.org/）" extension
Export（Netscape format）→ place in skill directory → retry

Bilibili: no zh-CN subtitle found

The script automatically falls back to ai-zh. If both fail, it lists all available subtitle codes. Use --lang <code> to specify one.

"Whisper not installed"

CODEBLOCK12

Whisper model download failed

The script tries hf-mirror.com then huggingface.co. If both fail (common in China), the script will print exact steps. Show the error message to the user verbatim — it contains the exact directory path and download URL.

Manual download (browser accessible in China):

1. Open: INLINECODE54
Download these 5 files: config.json model.bin tokenizer.json vocabulary.json INLINECODE59
Create the directory shown in the error message and place all 5 files there
Re-run the script — it auto-detects the local model, no download needed

For other model sizes (tiny/small/medium/large), change faster-whisper-base to faster-whisper-{size} in the ModelScope URL.

"ffmpeg not found" (during transcription)

See ffmpeg install commands in the Dependencies section above.

Video too long for Whisper

Use a smaller model:

export VIDEO_SUMMARY_WHISPER_MODEL=tiny

Extract subtitles. Let the agent think.

字幕提取技能

从视频平台提取原生格式的字幕。支持Bilibili、YouTube、小红书、抖音以及本地视频文件。

本技能范围： 仅限字幕提取。摘要、分析、问答——均由智能体根据用户实际请求处理。

功能说明

1. 从URL检测平台类型
通过yt-dlp提取原生字幕（若无原生字幕则使用Whisper转录）
输出：原始字幕文件路径 + 视频标题/作者
智能体将字幕保存至outputs/目录，并按用户请求进行处理

依赖项

调用脚本前，智能体必须验证依赖项。如有缺失，需告知用户并提供相应的安装命令。

yt-dlp — 必需（始终需要）

bash

检查

yt-dlp --version

安装

pip install yt-dlp # 所有平台（推荐） brew install yt-dlp # macOS Homebrew winget install yt-dlp.yt-dlp # Windows WinGet scoop install yt-dlp # Windows Scoop conda install -c conda-forge yt-dlp # Conda环境

升级现有安装

pip install -U yt-dlp

ffmpeg — 仅Whisper转录时需要

仅适用于小红书、抖音、本地文件或路径B（Whisper转录）。

bash

检查

ffmpeg -version

安装

brew install ffmpeg # macOS Homebrew winget install Gyan.FFmpeg # Windows WinGet choco install ffmpeg # Windows Chocolatey scoop install ffmpeg # Windows Scoop apt install ffmpeg # Ubuntu / Debian dnf install ffmpeg # Fedora / RHEL（可能需要RPM Fusion） pacman -S ffmpeg # Arch Linux snap install ffmpeg # Ubuntu Snap

Windows用户： 安装后需重启终端以使PATH生效。
若winget不可用，请从ffmpeg.org下载并将bin/目录添加到系统PATH中。

faster-whisper — 仅转录平台时需要

仅适用于小红书、抖音、本地文件或路径B（Whisper转录）。

bash

检查

python3 -c from faster_whisper import WhisperModel; print(ok)

安装

pip install faster-whisper

配置模型大小（默认：base）

export VIDEOSUMMARYWHISPER_MODEL=base # tiny | base | small | medium | large

注意： 首次转录运行时模型文件会自动下载（base模型约150MB）。根据网络速度可能需要一分钟。

中国网络提示： 若自动下载失败（HuggingFace被屏蔽），请参阅故障排除中的Whisper模型下载失败。

转录时间估算（CPU，faster-whisper）：

视频时长	tiny	base	small	medium
5分钟	~10秒	~20秒	~40秒	~80秒
15分钟

~30秒 | ~60秒 | ~2分钟 | ~4分钟 |
| 30分钟 | ~60秒 | ~2分钟 | ~5分钟 | ~10分钟 |

GPU可将转录速度提升5-15倍。首次运行需下载模型（base约150MB）。

Cookie配置

Bilibili — 需要Cookie

Bilibili的所有请求都需要cookie文件。脚本仅在技能目录中自动发现cookie文件（与subtitle-extractor.py和SKILL.md同目录）：

任何文件名包含bilibili的.txt文件都会被自动识别——包括浏览器扩展默认导出的www.bilibili.comnetscape.txt格式。

请将cookie文件放置在技能目录中。 智能体无需手动定位或传递cookie文件（参见步骤1b）。

小红书 / 抖音 — 手动配置

bash
video-summary https://www.xiaohongshu.com/explore/xxxxx --cookies cookies.txt

或

export VIDEOSUMMARYCOOKIES=/path/to/cookies.txt

如何导出cookie：

1. 安装浏览器扩展：Cookie Editor（https://cookieeditor.org/）
登录平台
将cookie导出为.txt文件（Netscape格式）

智能体工作流程

执行顺序 — 不可更改

在处理任何用户请求之前，必须完整完成本技能中的步骤1-4。在智能体进行摘要、翻译、分析或用户要求的任何其他任务之前，必须先将字幕文件保存到磁盘（步骤4）。

将步骤1-4视为强制性先决条件，而非可选的辅助步骤。即使用户的最终输出格式（如markdown文件）看似不需要，也不得跳过任何步骤。

步骤1 — 检查依赖项

bash
yt-dlp --version

如果用户请求Whisper转录（关键词：whisper转录 / 用whisper / transcribe / 转录 / 语音转文字），或者平台为小红书、抖音或本地文件，还需检查：

bash
ffmpeg -version
python3 -c from faster_whisper import WhisperModel; print(ok)

如有缺失，停止并告知用户需要安装的依赖项（参见依赖项部分）。

步骤1b — Bilibili Cookie

脚本会自动发现技能目录中包含bilibili的任何.txt文件。请勿自行搜索或传递cookie文件。

仅在脚本退出并显示以下信息时采取行动：

- 未找到 Bilibili Cookie 文件 → 告知用户将cookie文件放置在技能目录中
Bilibili 412 错误：Cookie 已过期 → 告知用户重新导出

导出方法：安装Cookie Editor (https://cookieeditor.org/)，登录Bilibili，导出Netscape格式 → 放入技能目录 → 重试。

步骤2 — 提取字幕

确定适用的路径，完全执行后再进入步骤3。

路径A — 原生字幕

适用场景：Bilibili或YouTube链接，且用户未提及任何转录关键词。

告知用户：正在提取字幕...

bash
python subtitle-extractor.py # 自动检测语言
python subtitle-extractor.py --lang zh-CN # 强制指定语言

解析stdout中的JSON。现在已获得步骤3所需的全部四个字段：

字段	值
title	来自此JSON
author

若脚本非零退出：读取stderr，向用户报告错误，停止。

路径B — Whisper转录

适用场景：用户提及任何转录关键词，或平台为小红书/抖音。

转录关键词优先于提取字幕或字幕原文等表述——后者描述的是期望的输出内容，而非方法。

调用1 — 下载音频（本地文件跳过，直接进入调用2）

告知用户：正在下载音频，请稍候...

bash
python subtitle-extractor.py --step download-audio

解析stdout中的JSON并记录以下值：

字段	值
title	来自此JSON
author

若脚本非零退出：读取stderr，向用户报告错误，停止。

告知用户：音频下载完成，开始 Whisper 转录（模型: base），请稍候...

调用2 — 转录

对于URL输入，使用调用1记录的audio_file：
bash
python subtitle-extractor.py --step transcribe

对于本地文件输入（设置title = 文件名，author = local）：
bash
python subtitle-extractor.py filepath> --step transcribe

解析stdout中的JSON并记录：

字段	值
subtitle_file	来自此JSON

告知用户：转录完成！

若脚本非零退出：

- 读取stderr，向用户报告错误，停止
若stderr包含Whisper 模型下载失败：逐字显示完整错误信息——其中包含确切的下载目录和手动操作步骤

失败规则： 请勿手动运行yt-dlp、ffmpeg或Whisper命令。除非错误信息明确指示，否则不要使用不同参数重试。

步骤3 — 确认数据

验证已从步骤2的脚本输出中收集到全部四个值：

| 字段 | 路径A来源 | 路径B来源 |
|------|-----------

Subtitle-Extractor字幕提取器