ai-photos
ai-photos turns one or more local photo sources into a searchable AI photo album for OpenClaw.
Supported formats:
- - macOS:
jpg, jpeg, png, webp, INLINECODE4 - Linux:
jpg, jpeg, png, INLINECODE8 - Linux
heic: best-effort only; do not promise captioning or preview support
When talking to users:
- - try to match the user's language
- explain the outcome simply: choose local folders now, then use OpenClaw to search and organize them
- stay focused on the current ai-photos request
- keep user-facing replies short and product-level: progress, readiness, and what the user can do next
- keep implementation details internal unless the user asks or troubleshooting requires them
- once indexing is complete and the backend is confirmed ready, say the album is ready and invite the user to try a search
- when the user asks what ai-photos can do, or when handing off a ready album, briefly describe the product in user terms:
- natural-language search across captions, scene labels, and tags
- date-based browsing and filtering
- a local web gallery for thumbnail browsing and large-photo viewing
- photo detail view with caption, scene, tags, capture time, device, location, orientation, and file info when available
- opening the original local file from the web UI
- manual sync now, optional automatic indexing later
- - when introducing the web UI, describe it as a local searchable gallery rather than an API or server unless implementation details are needed
- keep these capability descriptions short, concrete, and user-facing; do not drift into backend details
Suggested user-facing capability summary:
- - "You can search your photos in plain language, filter by date, and browse everything in a local gallery."
- "The web UI shows thumbnails, opens large previews, and lets you inspect captions, tags, time, device, location, and other file details when available."
- "You can also open the original local file directly, and later either sync changes manually or turn on automatic indexing."
Required outcome
This task is not complete until all of the following are true:
- 1. at least one photo source is chosen and readable for a new album
- image analysis is verified to work in the current OpenClaw runtime
- the album backend is created or reconnected and writable
- the first import succeeds, or an existing album is verified reachable
- the user explicitly approved automatic indexing or explicitly declined it
- if automatic indexing was approved, OpenClaw heartbeat is configured without breaking existing heartbeat tasks, the ai-photos block is present in
HEARTBEAT.md, and one verification heartbeat has run - the user has been told the album is ready and has been invited to try a search
- the user has been sent the final handoff
Internal terms
Use these terms for agent reasoning, troubleshooting, or recovery only.
Do not introduce them to the user unless needed.
- -
photo sources: one or more local paths scanned into the same album - INLINECODE12 : where the searchable photo index is stored
- INLINECODE13 : saved reconnect information, stored automatically under INLINECODE14
- INLINECODE15 : the manifest file that still needs vision captions and import
If the user asks what to save for later, explain that OpenClaw saves the reconnect information automatically at ~/.openclaw/ai-photos/albums/default.json, and that they only need to keep that file if they want a manual backup.
Caption schema
Each captioned JSONL line should contain the original manifest fields plus vision-model output.
Required base fields:
- - INLINECODE17
- INLINECODE18
- INLINECODE19
- INLINECODE20
- INLINECODE21
- INLINECODE22
- INLINECODE23
- INLINECODE24
- INLINECODE25
Vision fields:
- -
caption: one short factual sentence - INLINECODE27 : array of 5-12 short tags
- INLINECODE28 : short scene label
- INLINECODE29 : array of the main visible objects
- INLINECODE30 : visible text or INLINECODE31
Optional fields:
- -
metadata: free-form JSON object - INLINECODE33 : concatenated retrieval text; if omitted, the importer builds it
Example:
CODEBLOCK0
CLI runtime
This skill does not depend on a local Python environment or a checked-out Go source tree.
It uses the latest published ai-photos CLI release from:
- - repository: INLINECODE35
- install dir: INLINECODE36
- binary path: INLINECODE37
At the start of every ai-photos task, run the bootstrap flow exactly once and reuse the resulting binary path for the rest of the task.
Bootstrap flow
Run this shell block and capture its stdout as AI_PHOTOS_BIN:
CODEBLOCK1
Rules:
- - always run the bootstrap flow before using the CLI
- the bootstrap flow downloads the latest stable release asset from
releases/latest/download/... and does not call INLINECODE40 - if the latest asset download or unpack step fails, continue with the cached binary when one already exists
- if the latest asset download fails and no cached binary exists, setup is blocked
- do not tell the user to clone the repository or build the CLI locally unless troubleshooting requires it
- if you need command details, use
"$AI_PHOTOS_BIN" help or INLINECODE42
Onboarding
Step 0 - Choose mode
User-facing:
- - Ask whether the user wants to create a new photo album, reconnect an existing one, or search an already configured album.
- If they want to reconnect, explain that you will try the saved connection first and only ask for more details if needed.
INLINECODE43 Branching:
- -
1: continue to Step 1 - INLINECODE45 : continue to Step 3 and Step 4
- INLINECODE46 : go directly to Search flow
- if the user wants search but no configured album exists, tell them setup is required first
Step 1 - Ask for photo folders
User-facing:
- - Ask for one or more local folder paths that contain photos.
INLINECODE47
Do not continue until the user has provided at least one photo source.
Step 2 - Run preflight
User-facing:
- - Tell the user you will quickly verify that the folders are readable and that image analysis works before importing anything.
INLINECODE48
Before indexing anything, verify:
- - each photo source exists and is readable
- the selected sources contain supported image files
- INLINECODE49 is vision-capable
- image analysis actually works on a real image in the current OpenClaw runtime
- the installed CLI runs successfully
- local image preparation works on a real sample image through INLINECODE50
Suggested preflight sequence:
- 1. choose one real sample image from the provided sources
- run INLINECODE51
- on macOS, also run INLINECODE52
- inspect the JSON result
If the image backend check fails:
- - on macOS, treat this as blocking because
heic and local preview preparation depend on INLINECODE54 - on Linux, do not block setup for
jpg, jpeg, png, or webp; OpenClaw can still caption those files directly from the original path - on Linux, explain that preview preparation and large-image downscaling are reduced without a local backend
- only suggest installing ImageMagick when the user wants better local image preparation or troubleshooting requires it
If preflight fails:
- - tell the user setup is blocked in plain language
- explain exactly what must be fixed without exposing unnecessary implementation details
- stop
Step 3 - Choose the backend
INLINECODE59
- - if reconnecting, keep the existing backend
- otherwise use
db9 if it is installed and usable - if
db9 is not available, use INLINECODE62 - if using
TiDB Cloud Zero, tell the user to claim it if they want to keep it, but do not lead with backend details unless they matter
Step 4 - Create or reconnect the album
User-facing for a new album:
- - Tell the user setup is in progress and that the selected folders will be searchable through OpenClaw when it finishes.
- If useful, add one short product sentence such as: "You'll be able to search in plain language or browse everything in the local gallery once import finishes."
INLINECODE64
For a new album, run exactly one setup command:
CODEBLOCK2
Read the JSON output:
- -
profile_path tells you where the default album profile was saved - INLINECODE66 is the input for the first record ingestion pass
- INLINECODE67 tells you how many records still need captions and import
INLINECODE68 For reconnect:
- - try the saved default album profile first
- verify the backend is reachable
- verify the album can be searched or written
- ask only for missing backend details
Suggested reconnect check:
CODEBLOCK3
Do not continue until the backend is confirmed reachable.
Step 5 - Run the shared record ingestion flow
Use this same flow for:
- - the first album import
- later incremental updates
User-facing:
- - Tell the user photos are being imported and that large libraries may take some time.
INLINECODE69
Input:
- - first import:
caption_input_jsonl from INLINECODE71 - later updates:
incremental_manifest_jsonl from INLINECODE73
Before generating records, read the Caption schema section in this file.
INLINECODE75 For each record in the input manifest:
- 1. run INLINECODE76
- send the returned
output_path to the vision-capable model - preserve the original manifest fields from the source image
- add
caption, tags, scene, objects, and INLINECODE82 - write one JSON object per line into a captioned JSONL file
- import it with:
CODEBLOCK4
Rules:
- - keep captions short, factual, retrieval-oriented, and visually grounded
- INLINECODE83 prefers macOS
sips when available and also supports ImageMagick for Linux-friendly setups - if
prepare-image returns the original file path in caption mode, continue with that file instead of blocking the batch - on Linux, allow direct caption fallback for
jpg, jpeg, png, and webp when no local image backend is available - do not promise Linux
heic captioning or preview support - do not invent names, sensitive traits, or stories
- do not replace the original
file_path with the temporary derived image path - if one file still cannot be captioned, skip only that file and continue the rest of the batch
- if there is nothing to caption, skip this step
Step 6 - Enable automatic indexing
User-facing:
- - Offer automatic indexing in plain language.
- Explain that OpenClaw can periodically check the selected folders for new or changed photos and update the album index.
- Ask whether the user wants to enable that now.
INLINECODE92
If the user declines:
- - skip this step
- do not change heartbeat config
- do not change INLINECODE93
If the user says yes:
- - inspect the existing heartbeat config before changing anything
- do not overwrite or replace existing heartbeat tasks
- do not tell the user to manually restart Gateway for heartbeat-only changes
- let OpenClaw handle heartbeat configuration using its normal mechanisms unless debugging requires lower-level manual steps
- reuse the existing heartbeat scope and workspace whenever possible
- if there is more than one reasonable heartbeat-enabled scope, do not guess; ask the user which one should own ai-photos automatic indexing
- do not convert an existing per-agent heartbeat setup back into a defaults-based setup
- preserve existing heartbeat behavior unless a missing setting must be filled with a reasonable default
- do not spell out or rely on a fixed command recipe unless the current environment requires debugging
Then update <workspace>/HEARTBEAT.md without removing unrelated content:
- - if the file does not exist, create it
- if the file exists, preserve all existing user content
- manage only one ai-photos block delimited by stable markers
- if the ai-photos block already exists, replace only that block
- if the ai-photos block does not exist, append it to the end of the file
CODEBLOCK5
Do not rewrite the whole file just to add this block.
Then verify once:
- - trigger one heartbeat run if it is safe and practical in the current environment, otherwise wait for the next scheduled run
- check the heartbeat result and make sure the ai-photos task completed as intended
- do not claim success until the verification result is clear
Then tell the user the result:
- - success: explain that automatic indexing is active and the verification run succeeded
- declined: explain that the album is ready, but future changes require a manual re-index
- failed: explain that the album is usable, but automatic indexing is not active yet
Step 7 - Final handoff
User-facing handoff should include:
- - that the album is ready to use
- how the user can use it now: search in plain language or ask OpenClaw to help organize photos
- whether automatic indexing is on or off, in one short sentence only when it matters
- when useful, mention the local gallery capabilities in one short sentence: browse thumbnails, open large previews, inspect metadata, and open the original file
Keep the handoff short and user-facing.
Default to readiness, status, and next actions.
Only include implementation details when the user asks or recovery requires them.
INLINECODE95
Immediately after setup:
- - hand off directly once setup is ready
- tell the user the album is ready to search
- invite the user to search in plain language or ask OpenClaw to help organize photos
- if the user declined automatic indexing, say clearly that the album is in manual-only indexing mode
Search flow
When the user asks to find photos, run:
CODEBLOCK6
When answering:
- - summarize the best matches clearly and in plain language
- mention filenames, dates, or captions when useful
- answer at the product level unless the user asks for implementation details
- before sending an image file, run INLINECODE96
- send the returned
output_path when possible - if preview preparation fails on Linux without a local image backend, say so briefly and fall back to the original file only when it is safe to send as-is
- if results are weak, say so and suggest a better query
Local web search
When the user asks to open a browser view for the album:
- 1. start the local web service
- prefer the saved album profile; use environment variables only to fill missing backend fields
- wait for the JSON startup line and return the local URL to the user
- keep the process running while the user is browsing
If the user wants to open the gallery from another device:
- - recommend Tailscale as the default remote access path
- run
"$AI_PHOTOS_BIN" serve --host 0.0.0.0 only when they explicitly want remote access - explain that the startup JSON still prints a browser URL for the machine running
ai-photos; for remote access, share the machine's Tailscale IP or MagicDNS name instead - do not recommend exposing the port directly to the public internet unless the user explicitly asks for that tradeoff
- clarify that "open original" opens the file on the machine running
ai-photos, not on the remote client
Run:
CODEBLOCK7
If the user wants a specific album profile:
CODEBLOCK8
The web service provides:
- - a local search page
- search/filter/detail APIs for the page
- thumbnail and preview endpoints
- an action to open the original local file on the machine running INLINECODE101
When handing the web UI to the user:
- - describe the page in product terms, for example:
- "The page lets you search in plain language, filter by date, scroll the gallery, open a large preview, and inspect metadata on the right."
- "When a photo has metadata, the detail panel can show caption, scene, tags, capture time, device, location, orientation, and file info."
- - prefer this product summary over technical endpoint descriptions unless the user is debugging
Heartbeat run behavior
When a heartbeat arrives for a configured album:
- 1. run:
CODEBLOCK9
- 2. read the JSON output
- if
to_caption is 0, return INLINECODE104 - if
to_caption is greater than 0, run the shared record ingestion flow using INLINECODE107 - stay quiet unless indexing failed or user attention is needed
ai-photos
ai-photos 可将一个或多个本地照片源转换为 OpenClaw 的可搜索 AI 相册。
支持的格式:
- - macOS:jpg、jpeg、png、webp、heic
- Linux:jpg、jpeg、png、webp
- Linux heic:仅尽力支持;不保证提供标题生成或预览支持
与用户交流时:
- - 尽量使用用户的语言
- 简单说明结果:现在选择本地文件夹,然后使用 OpenClaw 进行搜索和整理
- 专注于当前的 ai-photos 请求
- 面向用户的回复保持简短且产品层面:进度、就绪状态以及用户下一步可执行的操作
- 除非用户询问或故障排查需要,否则将实现细节保留在内部
- 索引完成且后端确认就绪后,告知用户相册已就绪,并邀请用户尝试搜索
- 当用户询问 ai-photos 的功能,或移交就绪的相册时,用用户能理解的术语简要描述产品:
- 对标题、场景标签和标签进行自然语言搜索
- 基于日期的浏览和筛选
- 用于缩略图浏览和大图查看的本地网页画廊
- 照片详情视图,包含标题、场景、标签、拍摄时间、设备、位置、方向以及文件信息(如有)
- 从网页界面打开原始本地文件
- 手动立即同步,稍后可选择自动索引
- - 介绍网页界面时,将其描述为本地可搜索的画廊,而非 API 或服务器,除非需要实现细节
- 保持这些功能描述简短、具体且面向用户;不要涉及后端细节
建议的面向用户的功能摘要:
- - 您可以用自然语言搜索照片,按日期筛选,并在本地画廊中浏览所有内容。
- 网页界面显示缩略图,打开大图预览,并允许您查看标题、标签、时间、设备、位置以及其他文件详情(如有)。
- 您也可以直接打开原始本地文件,之后可以手动同步更改或开启自动索引。
必需成果
在满足以下所有条件之前,此任务不算完成:
- 1. 至少选择一个照片源且可读,用于创建新相册
- 验证在当前 OpenClaw 运行环境中图像分析功能正常
- 相册后端已创建或重新连接,并且可写入
- 首次导入成功,或已验证现有相册可访问
- 用户已明确批准或明确拒绝自动索引
- 如果批准了自动索引,则在不破坏现有心跳任务的情况下配置 OpenClaw 心跳,HEARTBEAT.md 中包含 ai-photos 块,并且已运行一次验证心跳
- 已告知用户相册已就绪,并邀请用户尝试搜索
- 已向用户发送最终移交信息
内部术语
这些术语仅用于代理推理、故障排查或恢复。
除非必要,否则不要向用户介绍。
- - photo sources:扫描到同一相册的一个或多个本地路径
- album backend:可搜索照片索引的存储位置
- album profile:保存的重新连接信息,自动存储在 ~/.openclaw/ai-photos/albums/default.json 下
- caption input JSONL:仍需要视觉标题和导入的清单文件
如果用户询问要保存什么以备后用,请解释 OpenClaw 会自动将重新连接信息保存在 ~/.openclaw/ai-photos/albums/default.json 中,如果他们想要手动备份,只需保留该文件即可。
标题模式
每个带标题的 JSONL 行应包含原始清单字段以及视觉模型输出。
必需的基本字段:
- - filepath
- filename
- sha256
- mimetype
- sizebytes
- width
- height
- takenat
- exif
视觉字段:
- - caption:一个简短的事实性句子
- tags:5-12 个简短标签的数组
- scene:简短场景标签
- objects:主要可见对象的数组
- textinimage:可见文本或 null
可选字段:
- - metadata:自由格式的 JSON 对象
- search_text:拼接的检索文本;如果省略,导入器会自行构建
示例:
json
{
file_path: /photos/2026/03/cat.jpg,
filename: cat.jpg,
sha256: abc123,
mime_type: image/jpeg,
size_bytes: 231231,
width: 3024,
height: 4032,
taken_at: 2026-03-12T09:12:00+00:00,
exif: {Make: Apple, Model: iPhone 15 Pro},
caption: 一只白猫躺在阳光窗边的灰色沙发上休息。,
tags: [猫, 沙发, 室内, 阳光, 宠物],
scene: 客厅,
objects: [猫, 沙发, 窗户],
textinimage: null,
metadata: {source: demo}
}
CLI 运行时
此技能不依赖本地 Python 环境或已签出的 Go 源代码树。
它使用来自以下位置的最新已发布 ai-photos CLI 版本:
- - 仓库:https://github.com/zoubingwu/openclaw-ai-photos
- 安装目录:~/.openclaw/ai-photos/bin
- 二进制文件路径:~/.openclaw/ai-photos/bin/ai-photos
在每个 ai-photos 任务开始时,仅运行一次引导流程,并在任务的其余部分重用生成的二进制文件路径。
引导流程
运行此 shell 块并将其标准输出捕获为 AIPHOTOSBIN:
bash
ensureaiphotos() {
AIPHOTOSREPO=zoubingwu/openclaw-ai-photos
AIPHOTOSBIN_DIR=$HOME/.openclaw/ai-photos/bin
AIPHOTOSBIN=$AIPHOTOSBIN_DIR/ai-photos
mkdir -p $AIPHOTOSBIN_DIR
os=$(uname -s | tr [:upper:] [:lower:])
case $os in
darwin) goos=darwin ;;
linux) goos=linux ;;
*)
echo 不支持的平台:$os >&2
return 1
;;
esac
arch=$(uname -m)
case $arch in
x86_64|amd64) goarch=amd64 ;;
arm64|aarch64) goarch=arm64 ;;
*)
echo 不支持的架构:$arch >&2
return 1
;;
esac
archivename=ai-photos${goos}_${goarch}.tar.gz
archiveurl=https://github.com/${AIPHOTOSREPO}/releases/latest/download/${archivename}
tmp_dir=$(mktemp -d)
hadexistingbinary=0
if [ -x $AIPHOTOSBIN ]; then
hadexistingbinary=1
fi
if curl -fL ${archiveurl} -o $tmpdir/${archive_name} \
&& tar -xzf $tmpdir/${archivename} -C $tmp_dir \
&& install -m 0755 $tmpdir/ai-photos $AIPHOTOS_BIN; then
rm -rf $tmp_dir
printf %s\n $AIPHOTOSBIN
return 0
fi
rm -rf $tmp_dir
if [ $hadexistingbinary -eq 1 ]; then
printf %s\n $AIPHOTOSBIN
return 0
fi
echo 无法下载 ai-photos 发布归档 >&2
return 1
}
AIPHOTOSBIN=$(ensureaiphotos)
规则:
- - 在使用 CLI 之前始终运行引导流程
- 引导流程从 releases/latest/download/... 下载最新的稳定版发布资产,并且不调用 api.github.com
- 如果最新资产下载或解包步骤失败,则在缓存二进制文件已存在时继续使用它
- 如果最新资产下载失败且没有缓存的二进制文件,则设置被阻止
- 除非故障排查需要,否则不要告诉用户克隆仓库或本地构建 CLI
- 如果需要命令详情,请使用 $AIPHOTOSBIN help 或 $AIPHOTOSBIN help
入门引导
步骤 0 - 选择模式
面向用户:
- - 询问用户是要创建新相册、重新连接现有相册,还是搜索已配置的相册。
- 如果他们想重新连接,请