Multimodal Memory

Store and retrieve visual content — user images, charts, diagrams, website UIs — across conversations.

Important: Image Analysis

The primary model may not support vision. Always use analyze.py to analyze images — it calls GPT-4o directly via API and does not rely on your own vision capability.

Storage Location

All data lives in ~/.multimodal-memory/:

- images/ — saved copies of captured images
INLINECODE3 — SQLite database (auto-created)
INLINECODE4 — human-readable summary (auto-updated)

Read ~/.multimodal-memory/memory.md at session start for a quick overview.

Scenarios & Actions

1. User Sends an Image / Chart / Diagram

When a user sends an image, OpenClaw saves it locally and provides the file path in the message context (look for a path like /tmp/... or ~/.openclaw/...).

Run analyze.py with that path — it calls GPT-4o to analyze and stores the result automatically:

CODEBLOCK0

For charts use --source "chart", for diagrams use --source "image".

If you cannot find the file path in the message context, ask the user:

"请问这张图片保存在哪个路径？或者你可以直接粘贴文件路径给我。"

2. User Asks to Capture / Remember a Website

Step 1 — take the screenshot:

python {baseDir}/scripts/capture_url.py --url "https://example.com"

The script prints the saved screenshot path.

Step 2 — analyze and store it:
CODEBLOCK2

3. User Searches by Text

CODEBLOCK3

Show results with descriptions and image paths.

4. User Sends an Image to Search (find similar memories)

Step 1 — analyze the query image to get its description:
CODEBLOCK4

Step 2 — the analysis is stored; also search for similar past content using the description keywords:
CODEBLOCK5

Step 3 — present matching memories and explain why they're relevant.

5. List Recent Memories

CODEBLOCK6

Core Rules

- Never try to analyze images yourself — always delegate to analyze.py.
After storing, confirm to user: description + tags saved.
Image paths must be absolute.
The --extra-tags arg accepts comma-separated additional tags.

One-Time Setup for URL Capture

If capture_url.py fails:
CODEBLOCK7

Script Reference

Script	Purpose	Key args
INLINECODE14	Analyze image with GPT-4o + store	INLINECODE15, `--source`, `--url`, INLINECODE18
INLINECODE19

多模态记忆

跨对话存储和检索视觉内容——用户图像、图表、示意图、网站界面。

重要提示：图像分析

主模型可能不支持视觉功能。 请始终使用 analyze.py 分析图像——它通过 API 直接调用 GPT-4o，不依赖您自身的视觉能力。

存储位置

所有数据存储在 ~/.multimodal-memory/ 目录下：

- images/ — 捕获图像的保存副本
metadata.db — SQLite 数据库（自动创建）
memory.md — 人类可读的摘要（自动更新）

在会话开始时读取 ~/.multimodal-memory/memory.md 以快速了解概况。

场景与操作

1. 用户发送图像/图表/示意图

当用户发送图像时，OpenClaw 会将其保存到本地，并在消息上下文中提供文件路径（查找类似 /tmp/... 或 ~/.openclaw/... 的路径）。

使用该路径运行 analyze.py——它会调用 GPT-4o 进行分析并自动存储结果：

bash
python {baseDir}/scripts/analyze.py \
--image-path /绝对/路径/到/image.jpg \
--source image

对于图表使用 --source chart，对于示意图使用 --source image。

如果在消息上下文中找不到文件路径，请询问用户：

请问这张图片保存在哪个路径？或者你可以直接粘贴文件路径给我。

2. 用户要求捕获/记住一个网站

第一步——截取屏幕截图：
bash
python {baseDir}/scripts/capture_url.py --url https://example.com

该脚本会打印保存的截图路径。

第二步——分析并存储：
bash
python {baseDir}/scripts/analyze.py \
--image-path /上面/打印的/路径.png \
--source website \
--url https://example.com

3. 用户通过文本搜索

bash
python {baseDir}/scripts/search.py --query 深色主题登录界面

显示结果及其描述和图像路径。

4. 用户发送图像进行搜索（查找相似记忆）

第一步——分析查询图像以获取其描述：
bash
python {baseDir}/scripts/analyze.py \
--image-path /路径/到/查询/image.jpg \
--source image

第二步——分析结果已存储；同时使用描述关键词搜索相似的历史内容：
bash
python {baseDir}/scripts/search.py --query 分析输出中的关键概念

第三步——展示匹配的记忆并解释它们的相关性。

5. 列出最近的记忆

bash
python {baseDir}/scripts/list.py --limit 20

核心规则

- 切勿自行分析图像——始终委托给 analyze.py。
存储后，向用户确认：描述 + 标签已保存。
图像路径必须是绝对路径。
--extra-tags 参数接受逗号分隔的附加标签。

URL 捕获的一次性设置

如果 capture_url.py 失败：
bash
pip install playwright && python -m playwright install chromium

脚本参考

脚本	用途	关键参数
analyze.py	使用 GPT-4o 分析图像并存储	--image-path, --source, --url, --extra-tags
store.py

multimodal-memory多模态记忆