Natural Language Video Search
Search video files using natural language queries powered by Gemini Embedding 2's native video-to-vector embedding.
What This Skill Does
This skill lets you index video files (dashcam footage, security camera recordings, any mp4) into a local vector database, then search them by describing what you're looking for in plain English. The top match is automatically trimmed and saved as a clip.
For Tesla dashcam footage, an optional telemetry overlay can burn speed, GPS, location, and turn signal data onto trimmed clips.
Setup
Requires uv and Python 3.11+.
- 1. Clone and install:
CODEBLOCK0
For Tesla overlay support (reverse geocoding):
CODEBLOCK1
- 2. Set your Gemini API key:
CODEBLOCK2
This prompts for your key, writes it to .env, and validates it with a test embedding. You can also set GEMINI_API_KEY directly as an environment variable.
Commands
Index video files
CODEBLOCK3
Options: --chunk-duration (default 30s), --overlap (default 5s), --no-preprocess, --target-resolution, --target-fps, --skip-still / --no-skip-still, INLINECODE9
Search indexed footage
CODEBLOCK4
Options: -n / --results (default 5), -o / --output-dir, --trim / --no-trim, --threshold (default 0.41), --overlay / --no-overlay (Tesla telemetry), INLINECODE19
Apply Tesla telemetry overlay
CODEBLOCK5
Burns a HUD overlay onto a Tesla dashcam video showing speed, GPS coordinates, location name, and turn signal status. Reads telemetry from SEI NAL units embedded in Tesla firmware 2025.44.25+. Also available as --overlay flag on the search command to automatically overlay the trimmed clip.
Check index stats
CODEBLOCK6
How It Works
Video files are split into overlapping chunks. Still-frame detection can skip chunks with no meaningful visual change, eliminating unnecessary API calls — this is the primary cost saver for idle footage like sentry mode or security cameras. Chunks are also preprocessed (reduced frame rate and resolution) to shrink upload size and speed up transfers, though the Gemini API bills based on video duration at a fixed token rate, not file size, so preprocessing does not reduce per-chunk token cost. Each chunk is embedded as raw video using Gemini Embedding 2 (no transcription or captioning). Vectors are stored in a local ChromaDB database. Text queries are embedded into the same vector space and matched via cosine similarity. The top match is auto-trimmed from the original file via ffmpeg.
When To Use This Skill
- - User asks to search through video files or footage
- User wants to find a specific moment in a video by describing it
- User asks to index or organize video footage for search
- User mentions dashcam, security camera, or surveillance clips
- User wants to find and extract a clip from a longer video
- User has Tesla dashcam footage and wants speed/GPS/location overlay on clips
- User wants to apply telemetry overlay to a Tesla video
Example Interactions
User: "Search my dashcam footage for a white truck cutting me off"
Action: Run INLINECODE21
User: "Index all the video files in my Downloads folder"
Action: Run INLINECODE22
User: "Search for a red light and include the Tesla overlay on the clip"
Action: Run INLINECODE23
User: "Add the speed and GPS overlay to this Tesla video"
Action: Run INLINECODE24
User: "How much footage do I have indexed?"
Action: Run INLINECODE25
Rules
- - Always run
sentrysearch init or confirm GEMINIAPIKEY is set before indexing or searching. - If ffmpeg is not found on PATH, the bundled
imageio-ffmpeg fallback is used automatically. - Indexing costs ~$2.84/hour of active footage with default settings. Cost is driven by the number of chunks sent to the API — footage with long idle periods (sentry mode, security cameras) will be significantly cheaper since still-frame skipping eliminates those chunks entirely. Warn the user before indexing large directories.
- Search results include similarity scores. Scores below the threshold (default 0.41) trigger a low-confidence prompt before trimming.
- The Tesla overlay requires firmware 2025.44.25+ for SEI metadata. Videos without Tesla metadata will skip the overlay gracefully.
- Requires Python 3.11+.
自然语言视频搜索
使用由Gemini Embedding 2原生视频转向量嵌入技术驱动的自然语言查询来搜索视频文件。
该技能的功能
该技能允许你将视频文件(行车记录仪录像、安防摄像头录制、任何mp4文件)索引到本地向量数据库中,然后通过用自然语言描述你要查找的内容来搜索它们。匹配度最高的结果会自动裁剪并保存为片段。
对于特斯拉行车记录仪录像,可选的遥测叠加功能可以将速度、GPS、位置和转向灯数据烧录到裁剪后的片段上。
设置
需要uv和Python 3.11+。
- 1. 克隆并安装:
bash
git clone https://github.com/ssrajadh/sentrysearch.git
cd sentrysearch
uv sync
如需特斯拉叠加支持(反向地理编码):
bash
uv sync --extra tesla
- 2. 设置你的Gemini API密钥:
bash
sentrysearch init
这会提示输入你的密钥,将其写入.env文件,并通过测试嵌入进行验证。你也可以直接将GEMINIAPIKEY设置为环境变量。
命令
索引视频文件
bash
sentrysearch index <目录或文件>
选项:--chunk-duration(默认30秒),--overlap(默认5秒),--no-preprocess,--target-resolution,--target-fps,--skip-still / --no-skip-still,--verbose
搜索已索引的录像
bash
sentrysearch search <自然语言查询>
选项:-n / --results(默认5),-o / --output-dir,--trim / --no-trim,--threshold(默认0.41),--overlay / --no-overlay(特斯拉遥测),--verbose
应用特斯拉遥测叠加
bash
sentrysearch overlay <视频文件>
sentrysearch overlay <视频文件> -o output.mp4
在特斯拉行车记录仪视频上烧录HUD叠加层,显示速度、GPS坐标、位置名称和转向灯状态。从特斯拉固件2025.44.25+中嵌入的SEI NAL单元读取遥测数据。也可作为搜索命令上的--overlay标志使用,自动叠加裁剪后的片段。
查看索引统计
bash
sentrysearch stats
工作原理
视频文件被分割成重叠的片段。静态帧检测可以跳过没有明显视觉变化的片段,从而消除不必要的API调用——这是哨兵模式或安防摄像头等空闲录像的主要成本节省方式。片段还会进行预处理(降低帧率和分辨率)以减小上传大小并加快传输速度,不过Gemini API按固定令牌率根据视频时长计费,而非文件大小,因此预处理不会降低每个片段的令牌成本。每个片段使用Gemini Embedding 2作为原始视频进行嵌入(无需转录或字幕)。向量存储在本地ChromaDB数据库中。文本查询被嵌入到相同的向量空间中,并通过余弦相似度进行匹配。匹配度最高的结果通过ffmpeg从原始文件中自动裁剪。
何时使用该技能
- - 用户要求搜索视频文件或录像
- 用户想通过描述来查找视频中的特定时刻
- 用户要求索引或整理视频录像以便搜索
- 用户提到行车记录仪、安防摄像头或监控片段
- 用户想从较长的视频中查找并提取片段
- 用户有特斯拉行车记录仪录像,并希望在片段上叠加速度/GPS/位置信息
- 用户想对特斯拉视频应用遥测叠加
交互示例
用户:搜索我的行车记录仪录像,找一辆白色卡车别我
操作:运行 sentrysearch search white truck cutting me off
用户:索引我下载文件夹中的所有视频文件
操作:运行 sentrysearch index ~/Downloads
用户:搜索闯红灯,并在片段上包含特斯拉叠加信息
操作:运行 sentrysearch search running a red light --overlay
用户:给这个特斯拉视频添加速度和GPS叠加信息
操作:运行 sentrysearch overlay /path/to/tesla_video.mp4
用户:我索引了多少录像?
操作:运行 sentrysearch stats
规则
- - 在索引或搜索之前,始终运行sentrysearch init或确认GEMINIAPIKEY已设置。
- 如果在PATH中找不到ffmpeg,将自动使用捆绑的imageio-ffmpeg回退方案。
- 使用默认设置,索引活跃录像的成本约为每小时2.84美元。成本由发送到API的片段数量决定——长时间空闲的录像(哨兵模式、安防摄像头)将显著更便宜,因为静态帧跳过会完全消除这些片段。在索引大型目录之前警告用户。
- 搜索结果包含相似度分数。低于阈值(默认0.41)的分数会在裁剪前触发低置信度提示。
- 特斯拉叠加需要固件2025.44.25+以获取SEI元数据。没有特斯拉元数据的视频将优雅地跳过叠加。
- 需要Python 3.11+。