Video Understanding (Gemini)

Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.

Requirements

- yt-dlp — brew install yt-dlp / INLINECODE2
INLINECODE3 — brew install ffmpeg (for merging video+audio streams)
INLINECODE5 environment variable

Default Output

Returns structured JSON:

- transcript — Verbatim transcript with [MM:SS] timestamps
description — Visual description (people, setting, UI, text on screen, flow)
summary — 2-3 sentence summary
duration_seconds — Estimated duration
speakers — Identified speakers

Usage

Analyze a video (structured JSON output)

CODEBLOCK0

Ask a question (adds "answer" field)

CODEBLOCK1

Override prompt entirely

CODEBLOCK2

Download only (no analysis)

CODEBLOCK3

Options

Flag	Description	Default
INLINECODE7 / INLINECODE8	Question to answer (added to default fields)	none
INLINECODE9 / INLINECODE10

How It Works

1. YouTube URLs → Passed directly to Gemini (no download needed)
All other URLs → Downloaded via yt-dlp → uploaded to Gemini File API → poll until processed
Gemini analyzes video with structured prompt → returns JSON
Temp files and Gemini uploads cleaned up automatically

Supported Sources

Any URL supported by yt-dlp: Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, Dailymotion, Twitch, and 1000+ more.

Tips

- Use -q for targeted questions on top of the full analysis
YouTube is fastest (no download step)
Large videos (10min+) work fine — Gemini File API supports up to 2GB (free) / 20GB (paid)
The script auto-installs Python dependencies via INLINECODE20

视频理解（Gemini）

使用Google Gemini的多模态视频理解功能分析视频。通过yt-dlp支持1000+视频源。

环境要求

- yt-dlp — brew install yt-dlp / pip install yt-dlp
ffmpeg — brew install ffmpeg（用于合并视频+音频流）
GEMINIAPIKEY 环境变量

默认输出

返回结构化JSON：

- transcript — 带[MM:SS]时间戳的逐字转录文本
description — 视觉描述（人物、场景、界面、屏幕文字、流程）
summary — 2-3句摘要
duration_seconds — 预估时长
speakers — 识别出的说话人

使用方法

分析视频（结构化JSON输出）

bash
uv run {baseDir}/scripts/analyze_video.py <视频链接>

提问（增加answer字段）

bash
uv run {baseDir}/scripts/analyze_video.py <视频链接> -q 展示的是什么产品？

完全覆盖提示词

bash
uv run {baseDir}/scripts/analyze_video.py <视频链接> -p 自定义提示词 --raw

仅下载（不分析）

bash
uv run {baseDir}/scripts/analyze_video.py <视频链接> --download-only -o video.mp4

选项

标志	描述	默认值
-q / --question	要回答的问题（添加到默认字段）	无
-p / --prompt

工作原理

1. YouTube链接 → 直接传递给Gemini（无需下载）
所有其他链接 → 通过yt-dlp下载 → 上传至Gemini文件API → 轮询直至处理完成
Gemini使用结构化提示词分析视频 → 返回JSON
临时文件和Gemini上传内容自动清理

支持的源

yt-dlp支持的任何链接：Loom、YouTube、TikTok、Vimeo、Twitter/X、Instagram、Dailymotion、Twitch等1000+源。

提示

- 在全量分析基础上使用-q进行针对性提问
YouTube最快（无需下载步骤）
大视频（10分钟以上）也能正常处理——Gemini文件API支持最大2GB（免费）/ 20GB（付费）
脚本通过uv自动安装Python依赖

video-understanding视频理解