Video Understanding (Gemini)
Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.
Requirements
- -
yt-dlp — brew install yt-dlp / INLINECODE2 - INLINECODE3 —
brew install ffmpeg (for merging video+audio streams) - INLINECODE5 environment variable
Default Output
Returns structured JSON:
- - transcript — Verbatim transcript with
[MM:SS] timestamps - description — Visual description (people, setting, UI, text on screen, flow)
- summary — 2-3 sentence summary
- duration_seconds — Estimated duration
- speakers — Identified speakers
Usage
Analyze a video (structured JSON output)
CODEBLOCK0
Ask a question (adds "answer" field)
CODEBLOCK1
Override prompt entirely
CODEBLOCK2
Download only (no analysis)
CODEBLOCK3
Options
| Flag | Description | Default |
|---|
| INLINECODE7 / INLINECODE8 | Question to answer (added to default fields) | none |
| INLINECODE9 / INLINECODE10 |
Override entire prompt (ignores -q) | structured JSON |
|
-m /
--model | Gemini model | gemini-2.5-flash |
|
-o /
--output | Save output to file | stdout |
|
--keep | Keep downloaded video file | false |
|
--download-only | Download only, skip analysis | false |
|
--max-size | Max file size in MB | 500 |
|
--raw | Raw text output instead of JSON | false |
How It Works
- 1. YouTube URLs → Passed directly to Gemini (no download needed)
- All other URLs → Downloaded via yt-dlp → uploaded to Gemini File API → poll until processed
- Gemini analyzes video with structured prompt → returns JSON
- Temp files and Gemini uploads cleaned up automatically
Supported Sources
Any URL supported by yt-dlp: Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, Dailymotion, Twitch, and 1000+ more.
Tips
- - Use
-q for targeted questions on top of the full analysis - YouTube is fastest (no download step)
- Large videos (10min+) work fine — Gemini File API supports up to 2GB (free) / 20GB (paid)
- The script auto-installs Python dependencies via INLINECODE20
视频理解(Gemini)
使用Google Gemini的多模态视频理解功能分析视频。通过yt-dlp支持1000+视频源。
环境要求
- - yt-dlp — brew install yt-dlp / pip install yt-dlp
- ffmpeg — brew install ffmpeg(用于合并视频+音频流)
- GEMINIAPIKEY 环境变量
默认输出
返回结构化JSON:
- - transcript — 带[MM:SS]时间戳的逐字转录文本
- description — 视觉描述(人物、场景、界面、屏幕文字、流程)
- summary — 2-3句摘要
- duration_seconds — 预估时长
- speakers — 识别出的说话人
使用方法
分析视频(结构化JSON输出)
bash
uv run {baseDir}/scripts/analyze_video.py <视频链接>
提问(增加answer字段)
bash
uv run {baseDir}/scripts/analyze_video.py <视频链接> -q 展示的是什么产品?
完全覆盖提示词
bash
uv run {baseDir}/scripts/analyze_video.py <视频链接> -p 自定义提示词 --raw
仅下载(不分析)
bash
uv run {baseDir}/scripts/analyze_video.py <视频链接> --download-only -o video.mp4
选项
| 标志 | 描述 | 默认值 |
|---|
| -q / --question | 要回答的问题(添加到默认字段) | 无 |
| -p / --prompt |
覆盖整个提示词(忽略-q) | 结构化JSON |
| -m / --model | Gemini模型 | gemini-2.5-flash |
| -o / --output | 保存输出到文件 | 标准输出 |
| --keep | 保留下载的视频文件 | false |
| --download-only | 仅下载,跳过分析 | false |
| --max-size | 最大文件大小(MB) | 500 |
| --raw | 原始文本输出而非JSON | false |
工作原理
- 1. YouTube链接 → 直接传递给Gemini(无需下载)
- 所有其他链接 → 通过yt-dlp下载 → 上传至Gemini文件API → 轮询直至处理完成
- Gemini使用结构化提示词分析视频 → 返回JSON
- 临时文件和Gemini上传内容自动清理
支持的源
yt-dlp支持的任何链接:Loom、YouTube、TikTok、Vimeo、Twitter/X、Instagram、Dailymotion、Twitch等1000+源。
提示
- - 在全量分析基础上使用-q进行针对性提问
- YouTube最快(无需下载步骤)
- 大视频(10分钟以上)也能正常处理——Gemini文件API支持最大2GB(免费)/ 20GB(付费)
- 脚本通过uv自动安装Python依赖