Zotero Vectorize

Build and maintain a local-first, cross-platform Zotero vector store for semantic search and RAG over bibliographic metadata and PDF full text.

Keep SKILL.md focused on workflow. Read the reference files only when needed:

- references/config.md — paths, environment variables, output layout
INLINECODE2 — JSON schemas and file naming
INLINECODE3 / macos.md / linux.md — platform-specific path defaults and notes
INLINECODE6 — common failures and recovery

Core rules

- Treat Zotero as read-only input. Never modify the user’s Zotero database or attachment storage.
Prefer creating a database snapshot before reading.
For incremental updates: check first, report missing items, wait for user confirmation, then apply.
Before any update that rewrites store files: back up first, then write.
Backup retention for this skill is fixed: keep only the latest and previous backup per file.
Default output filenames are:

- metadata_vectors.json - fulltext_vectors.json - INLINECODE9

Workflow decision tree

1) Detect or confirm paths

If the Zotero data directory, database path, or storage path is unknown:

1. Read INLINECODE10
Read the platform-specific reference (windows.md, macos.md, or linux.md)
Run:

CODEBLOCK0

If the detected paths are wrong, ask the user to open Zotero and use Show Data Directory, then rerun with explicit --data-dir, --db, or --storage-dir.

2) Create a database snapshot

Before full builds or incremental checks, snapshot the Zotero database:

CODEBLOCK1

If snapshotting fails because SQLite is locked, ask the user to close Zotero and retry.

3) Build the metadata vector store

Use this when the user asks to create or rebuild metadata embeddings for the Zotero library.

CODEBLOCK2

This writes metadata_vectors.json and refreshes vector_store_metadata.json + README.md.

4) Build the full-text vector store

Use this when the user asks to create or rebuild PDF full-text embeddings.

CODEBLOCK3

This scans Zotero PDF attachments, extracts text, chunks it, embeds each chunk, and writes fulltext_vectors.json.

5) Check incremental updates

Use this when the user asks whether Zotero contains new items not yet added to the vector store.

CODEBLOCK4

Report:

- total top-level Zotero items
total PDF-parent items
current metadata/fulltext vector counts
missing metadata items
missing fulltext items

Do not update the store yet.

6) Apply incremental updates

Only run this after the user confirms the update.

CODEBLOCK5

This script:

1. snapshots the DB
backs up store files
appends missing metadata/fulltext entries
keeps only the latest and previous backup per file
updates store metadata and README

Use --item-id to limit the update to specific items if the user wants a partial apply.

7) Verify the finished store

After any build or incremental update, verify counts and sizes:

CODEBLOCK6

Always report:

- metadata item count
fulltext item count
fulltext chunk count
metadata file size
fulltext file size

Scripts

- scripts/detect_zotero_paths.py — resolve default/current Zotero paths
INLINECODE23 — create a safe SQLite snapshot
INLINECODE24 — full rebuild of metadata vectors
INLINECODE25 — full rebuild of PDF full-text vectors
INLINECODE26 — compare Zotero against current vector store
INLINECODE27 — append missing items after user confirmation
INLINECODE28 — back up store files and retain only the latest two states
INLINECODE29 — report counts, sizes, and store metadata

Output expectations

When using this skill successfully, return concise operational summaries such as:

- detected paths
snapshot path used
number of items/chunks written
current file sizes
whether any items are missing
which itemIDs were appended during incremental update

Escalation notes

Read references/troubleshooting.md when:

- SQLite snapshot fails
HuggingFace/model download or local model loading fails
PDFs are missing or unreadable
full-text extraction is incomplete
file paths differ from defaults on the current OS

Zotero 向量化

构建并维护一个本地优先、跨平台的 Zotero 向量存储，用于对文献元数据和 PDF 全文进行语义搜索和 RAG。

保持 SKILL.md 专注于工作流程。仅在需要时读取参考文件：

- references/config.md — 路径、环境变量、输出布局
references/data-format.md — JSON 模式和文件命名规则
references/windows.md / macos.md / linux.md — 特定平台的路径默认值和说明
references/troubleshooting.md — 常见故障和恢复方法

核心规则

- 将 Zotero 视为只读输入。切勿修改用户的 Zotero 数据库或附件存储。
在读取前，优先创建数据库快照。
对于增量更新：先检查，报告缺失条目，等待用户确认，再执行。
在任何会重写存储文件的更新之前：先备份，再写入。
本技能的备份保留策略固定：每个文件仅保留最新和上一个备份。
默认输出文件名：

- metadata_vectors.json - fulltext_vectors.json - vectorstoremetadata.json

工作流程决策树

1) 检测或确认路径

如果 Zotero 数据目录、数据库路径或存储路径未知：

1. 读取 references/config.md
读取特定平台的参考文件（windows.md、macos.md 或 linux.md）
运行：

bash
python scripts/detectzoteropaths.py

如果检测到的路径错误，请让用户打开 Zotero 并使用显示数据目录功能，然后使用显式的 --data-dir、--db 或 --storage-dir 参数重新运行。

2) 创建数据库快照

在完整构建或增量检查之前，创建 Zotero 数据库的快照：

bash
python scripts/snapshotzoterodb.py --output-dir

如果由于 SQLite 被锁定而导致快照失败，请让用户关闭 Zotero 并重试。

3) 构建元数据向量存储

当用户要求为 Zotero 库创建或重建元数据嵌入时使用此步骤。

bash
python scripts/buildmetadatavectors.py --output-dir

此操作会写入 metadatavectors.json，并刷新 vectorstore_metadata.json 和 README.md。

4) 构建全文向量存储

当用户要求创建或重建 PDF 全文嵌入时使用此步骤。

bash
python scripts/buildfulltextvectors.py --output-dir

此操作会扫描 Zotero PDF 附件，提取文本，进行分块，对每个块进行嵌入，并写入 fulltext_vectors.json。

5) 检查增量更新

当用户询问 Zotero 是否包含尚未添加到向量存储的新条目时使用此步骤。

bash
python scripts/checkincrementalupdates.py --output-dir

报告：

- Zotero 顶层条目总数
包含 PDF 的父条目总数
当前元数据/全文向量数量
缺失的元数据条目
缺失的全文条目

不要更新存储。

6) 执行增量更新

仅在用户确认更新后运行此步骤。

bash
python scripts/applyincrementalupdates.py --output-dir

此脚本会：

1. 创建数据库快照
备份存储文件
追加缺失的元数据/全文条目
每个文件仅保留最新和上一个备份
更新存储元数据和 README

如果用户希望部分应用，可使用 --item-id 将更新限制为特定条目。

7) 验证完成的存储

在任何构建或增量更新之后，验证数量和大小：

bash
python scripts/verifyvectorstore.py --output-dir

始终报告：

- 元数据条目数量
全文条目数量
全文块数量
元数据文件大小
全文文件大小

脚本

- scripts/detectzoteropaths.py — 解析默认/当前的 Zotero 路径
scripts/snapshotzoterodb.py — 创建安全的 SQLite 快照
scripts/buildmetadatavectors.py — 完整重建元数据向量
scripts/buildfulltextvectors.py — 完整重建 PDF 全文向量
scripts/checkincrementalupdates.py — 比较 Zotero 与当前向量存储
scripts/applyincrementalupdates.py — 在用户确认后追加缺失条目
scripts/backupwithretention.py — 备份存储文件，仅保留最新的两个状态
scripts/verifyvectorstore.py — 报告数量、大小和存储元数据

输出预期

成功使用此技能时，返回简洁的操作摘要，例如：

- 检测到的路径
使用的快照路径
写入的条目/块数量
当前文件大小
是否有缺失的条目
增量更新期间追加了哪些 itemID

升级说明

在以下情况读取 references/troubleshooting.md：

- SQLite 快照失败
HuggingFace/模型下载或本地模型加载失败
PDF 缺失或无法读取
全文提取不完整
文件路径与当前操作系统的默认值不同

zotero-vectorizeZotero向量化

zotero-vectorize

Zotero Vectorize

Core rules