Natural Language Video Search

Search video files using natural language queries powered by Gemini Embedding 2's native video-to-vector embedding.

What This Skill Does

This skill lets you index video files (dashcam footage, security camera recordings, any mp4) into a local vector database, then search them by describing what you're looking for in plain English. The top match is automatically trimmed and saved as a clip.

For Tesla dashcam footage, an optional telemetry overlay can burn speed, GPS, location, and turn signal data onto trimmed clips.

Setup

Requires uv and Python 3.11+.

1. Clone and install:

CODEBLOCK0

For Tesla overlay support (reverse geocoding):

CODEBLOCK1

2. Set your Gemini API key:

CODEBLOCK2

This prompts for your key, writes it to .env, and validates it with a test embedding. You can also set GEMINI_API_KEY directly as an environment variable.

Commands

Index video files

CODEBLOCK3

Options: --chunk-duration (default 30s), --overlap (default 5s), --no-preprocess, --target-resolution, --target-fps, --skip-still / --no-skip-still, INLINECODE9

Search indexed footage

CODEBLOCK4

Options: -n / --results (default 5), -o / --output-dir, --trim / --no-trim, --threshold (default 0.41), --overlay / --no-overlay (Tesla telemetry), INLINECODE19

Apply Tesla telemetry overlay

CODEBLOCK5

Burns a HUD overlay onto a Tesla dashcam video showing speed, GPS coordinates, location name, and turn signal status. Reads telemetry from SEI NAL units embedded in Tesla firmware 2025.44.25+. Also available as --overlay flag on the search command to automatically overlay the trimmed clip.

Check index stats

CODEBLOCK6

How It Works

Video files are split into overlapping chunks. Still-frame detection can skip chunks with no meaningful visual change, eliminating unnecessary API calls — this is the primary cost saver for idle footage like sentry mode or security cameras. Chunks are also preprocessed (reduced frame rate and resolution) to shrink upload size and speed up transfers, though the Gemini API bills based on video duration at a fixed token rate, not file size, so preprocessing does not reduce per-chunk token cost. Each chunk is embedded as raw video using Gemini Embedding 2 (no transcription or captioning). Vectors are stored in a local ChromaDB database. Text queries are embedded into the same vector space and matched via cosine similarity. The top match is auto-trimmed from the original file via ffmpeg.

When To Use This Skill

- User asks to search through video files or footage
User wants to find a specific moment in a video by describing it
User asks to index or organize video footage for search
User mentions dashcam, security camera, or surveillance clips
User wants to find and extract a clip from a longer video
User has Tesla dashcam footage and wants speed/GPS/location overlay on clips
User wants to apply telemetry overlay to a Tesla video

Example Interactions

User: "Search my dashcam footage for a white truck cutting me off"
Action: Run INLINECODE21

User: "Index all the video files in my Downloads folder"
Action: Run INLINECODE22

User: "Search for a red light and include the Tesla overlay on the clip"
Action: Run INLINECODE23

User: "Add the speed and GPS overlay to this Tesla video"
Action: Run INLINECODE24

User: "How much footage do I have indexed?"
Action: Run INLINECODE25

Rules

- Always run sentrysearch init or confirm GEMINIAPIKEY is set before indexing or searching.
If ffmpeg is not found on PATH, the bundled imageio-ffmpeg fallback is used automatically.
Indexing costs ~$2.84/hour of active footage with default settings. Cost is driven by the number of chunks sent to the API — footage with long idle periods (sentry mode, security cameras) will be significantly cheaper since still-frame skipping eliminates those chunks entirely. Warn the user before indexing large directories.
Search results include similarity scores. Scores below the threshold (default 0.41) trigger a low-confidence prompt before trimming.
The Tesla overlay requires firmware 2025.44.25+ for SEI metadata. Videos without Tesla metadata will skip the overlay gracefully.
Requires Python 3.11+.

自然语言视频搜索

使用由Gemini Embedding 2原生视频转向量嵌入技术驱动的自然语言查询来搜索视频文件。

该技能的功能

该技能允许你将视频文件（行车记录仪录像、安防摄像头录制、任何mp4文件）索引到本地向量数据库中，然后通过用自然语言描述你要查找的内容来搜索它们。匹配度最高的结果会自动裁剪并保存为片段。

对于特斯拉行车记录仪录像，可选的遥测叠加功能可以将速度、GPS、位置和转向灯数据烧录到裁剪后的片段上。

设置

需要uv和Python 3.11+。

1. 克隆并安装：

bash
git clone https://github.com/ssrajadh/sentrysearch.git
cd sentrysearch
uv sync

如需特斯拉叠加支持（反向地理编码）：

bash
uv sync --extra tesla

2. 设置你的Gemini API密钥：

bash
sentrysearch init

这会提示输入你的密钥，将其写入.env文件，并通过测试嵌入进行验证。你也可以直接将GEMINIAPIKEY设置为环境变量。

命令

索引视频文件

bash
sentrysearch index <目录或文件>

选项：--chunk-duration（默认30秒），--overlap（默认5秒），--no-preprocess，--target-resolution，--target-fps，--skip-still / --no-skip-still，--verbose

搜索已索引的录像

bash
sentrysearch search <自然语言查询>

选项：-n / --results（默认5），-o / --output-dir，--trim / --no-trim，--threshold（默认0.41），--overlay / --no-overlay（特斯拉遥测），--verbose

应用特斯拉遥测叠加

bash
sentrysearch overlay <视频文件>
sentrysearch overlay <视频文件> -o output.mp4

在特斯拉行车记录仪视频上烧录HUD叠加层，显示速度、GPS坐标、位置名称和转向灯状态。从特斯拉固件2025.44.25+中嵌入的SEI NAL单元读取遥测数据。也可作为搜索命令上的--overlay标志使用，自动叠加裁剪后的片段。

查看索引统计

bash
sentrysearch stats

工作原理

视频文件被分割成重叠的片段。静态帧检测可以跳过没有明显视觉变化的片段，从而消除不必要的API调用——这是哨兵模式或安防摄像头等空闲录像的主要成本节省方式。片段还会进行预处理（降低帧率和分辨率）以减小上传大小并加快传输速度，不过Gemini API按固定令牌率根据视频时长计费，而非文件大小，因此预处理不会降低每个片段的令牌成本。每个片段使用Gemini Embedding 2作为原始视频进行嵌入（无需转录或字幕）。向量存储在本地ChromaDB数据库中。文本查询被嵌入到相同的向量空间中，并通过余弦相似度进行匹配。匹配度最高的结果通过ffmpeg从原始文件中自动裁剪。

何时使用该技能

- 用户要求搜索视频文件或录像
用户想通过描述来查找视频中的特定时刻
用户要求索引或整理视频录像以便搜索
用户提到行车记录仪、安防摄像头或监控片段
用户想从较长的视频中查找并提取片段
用户有特斯拉行车记录仪录像，并希望在片段上叠加速度/GPS/位置信息
用户想对特斯拉视频应用遥测叠加

交互示例

用户：搜索我的行车记录仪录像，找一辆白色卡车别我
操作：运行 sentrysearch search white truck cutting me off

用户：索引我下载文件夹中的所有视频文件
操作：运行 sentrysearch index ~/Downloads

用户：搜索闯红灯，并在片段上包含特斯拉叠加信息
操作：运行 sentrysearch search running a red light --overlay

用户：给这个特斯拉视频添加速度和GPS叠加信息
操作：运行 sentrysearch overlay /path/to/tesla_video.mp4

用户：我索引了多少录像？
操作：运行 sentrysearch stats

规则

- 在索引或搜索之前，始终运行sentrysearch init或确认GEMINIAPIKEY已设置。
如果在PATH中找不到ffmpeg，将自动使用捆绑的imageio-ffmpeg回退方案。
使用默认设置，索引活跃录像的成本约为每小时2.84美元。成本由发送到API的片段数量决定——长时间空闲的录像（哨兵模式、安防摄像头）将显著更便宜，因为静态帧跳过会完全消除这些片段。在索引大型目录之前警告用户。
搜索结果包含相似度分数。低于阈值（默认0.41）的分数会在裁剪前触发低置信度提示。
特斯拉叠加需要固件2025.44.25+以获取SEI元数据。没有特斯拉元数据的视频将优雅地跳过叠加。
需要Python 3.11+。

natural-language-video-search自然语言视频搜索