FileChat RAG Skill
Your personal RAG (Retrieval-Augmented Generation) document library backed by Google Drive.
Supports multiple Google Drive folders dynamically, interactive folder routing, incremental sync, choosing between Gemini or OpenAI for embeddings, and connecting to Qdrant.
Setup & Bootstrap
FIRST verify that the required environment variables are set in /workspace/skills/filechat/.env:
- 1.
EMBEDDING_PROVIDER (either gemini or openai) - INLINECODE4 or INLINECODE5
- Optional:
QDRANT_URL and QDRANT_API_KEY (If absent, it uses local disk-based JSON).
Create the .env file like this:
CODEBLOCK0
Google Workspace Authentication:
Before running any commands, check if the system is authenticated by running:
npx @googleworkspace/cli auth status
If it returns an auth error or indicates no token, you MUST prompt the user to authenticate. Trigger the interactive login flow:
npx @googleworkspace/cli auth login --services drive
Wait for the user to complete the browser OAuth flow before proceeding.
Folder Management
The user can have infinite folders synced. You manage them using folders.js.
- - List Folders: INLINECODE10
- Add a Folder:
node folders.js add "Taxes 2026" <FOLDER_ID> (Auto-discovers the ID via gws drive files list if you don't know it!) - Set Default Folder: INLINECODE13
If the user asks to do something with a file/folder but doesn't specify which one, run node folders.js get-default to find the default ID. If no folders exist, ask them to set one up!
How to Sync the Library
When the user asks to "sync", "flush", or "update", you must run the ingestion script.
To sync a specific folder:
CODEBLOCK3
To sync EVERYTHING (all folders in the registry):
CODEBLOCK4
Note: Syncs are highly incremental and use a local cache! If a file hasn't been modified in Drive, the script will skip it instantly and output "0 chunks" embedded. This is NORMAL behavior. If you are debugging, testing, or the user specifically requests a hard flush, you MUST delete the cache files first:
CODEBLOCK5
How to Answer User Questions (RAG)
Query the local vector store or Qdrant for the target Folder ID to fetch relevant text chunks:
CODEBLOCK6
Use the snippets returned to answer the user.
How to Retrieve and Send a Physical File
Find the File ID using the query script, then download it:
gws drive files get --params '{"fileId": "<FILE_ID>", "alt": "media"}' --output /workspace/discharge.pdf
Reply using the media tag:
MEDIA:/workspace/discharge.pdf.
How to Store a New File for the User
If the user uploads a file and asks you to save it (or implicitly sends a file per your automatic processing rules):
- 1. Check their folders (
node folders.js list). - If they didn't specify which folder, use the default folder. If no default is set, ask them!
- Notify the user exactly which folder the file is being saved to.
- Tell the user that you are now extracting the information and saving it in a vectordb.
- If the file is an image or scanned document, make sure to extract the text using a vision model or OCR before it is embedded. (The sync script handles this natively).
- Upload it to the correct folder using
gws:
gws drive files create --json '{"name": "filename.pdf", "parents": ["<FOLDER_ID>"]}' --upload /path/to/uploaded/file.pdf
- 7. Trigger
node sync.js <FOLDER_ID> so the vector database chunks and embeds the file into the corresponding vectordb.
How to Test & Validate the Skill
If the user asks you to verify the skill is working, or if you just set it up and want to ensure end-to-end functionality, follow these exact steps:
- 1. Verify Auth: Run
npx @googleworkspace/cli auth status. Ensure it shows a valid token. - Verify Drive Access: Do a dry-run fetch of the target folder to ensure GWS can see the files.
npx @googleworkspace/cli drive files list --params '{"q": "'\''<FOLDER_ID>'\'' in parents and trashed = false"}'
(If this fails, check folder permissions or GWS credentials.)
- 3. Force a Clean Sync: Clear the cache for the test folder to guarantee a fresh run, then sync.
rm -f ./skills/filechat/meta_<FOLDER_ID>.json ./skills/filechat/vector_db_<FOLDER_ID>.json
node ./skills/filechat/sync.js <FOLDER_ID>
(You should see files being downloaded, OCR'd, and chunks being embedded. If it says "0 chunks", verify the folder isn't empty.)
- 4. Test the Vector Query: Run a generic query to verify the embeddings were saved and cosine similarity works.
node ./skills/filechat/query.js <FOLDER_ID> "hello"
(You should see a list of "Top matches" with similarity scores and text snippets. If you do, the RAG pipeline is 100% operational!)
FileChat RAG 技能
您的个人RAG(检索增强生成)文档库,由Google Drive提供支持。
支持动态管理多个Google Drive文件夹、交互式文件夹路由、增量同步,可选择Gemini或OpenAI进行嵌入,并可连接到Qdrant。
设置与引导
首先验证/workspace/skills/filechat/.env中是否设置了所需的环境变量:
- 1. EMBEDDINGPROVIDER(gemini或openai)
- GEMINIAPIKEY或OPENAIAPIKEY
- 可选:QDRANTURL和QDRANTAPIKEY(如果缺失,则使用本地基于磁盘的JSON存储)。
按如下方式创建.env文件:
bash
echo EMBEDDING_PROVIDER=gemini > ./skills/filechat/.env
echo GEMINIAPIKEY=yourkeyhere >> ./skills/filechat/.env
Google Workspace身份验证:
在运行任何命令之前,通过运行以下命令检查系统是否已通过身份验证:
bash
npx @googleworkspace/cli auth status
如果返回身份验证错误或指示没有令牌,您必须提示用户进行身份验证。触发交互式登录流程:
bash
npx @googleworkspace/cli auth login --services drive
等待用户完成浏览器OAuth流程后再继续。
文件夹管理
用户可以同步无限数量的文件夹。您可以使用folders.js来管理它们。
- - 列出文件夹: cd ./skills/filechat && node folders.js list
- 添加文件夹: node folders.js add Taxes 2026 (如果您不知道ID,可以通过gws drive files list自动发现!)
- 设置默认文件夹: node folders.js default Taxes 2026
如果用户要求对文件/文件夹执行操作但没有指定具体对象,请运行node folders.js get-default查找默认ID。如果没有文件夹存在,请要求用户设置一个!
如何同步库
当用户要求同步、刷新或更新时,您必须运行摄取脚本。
同步特定文件夹:
bash
cd ./skills/filechat && node sync.js
同步所有内容(注册表中的所有文件夹):
bash
cd ./skills/filechat && node sync-all.js
注意:同步是高度增量的,并使用本地缓存!如果文件在Drive中没有被修改,脚本会立即跳过它并输出0 chunks已嵌入。这是正常行为。如果您正在调试、测试,或者用户特别要求硬刷新,您必须首先删除缓存文件:
bash
rm ./skills/filechat/metaID>.json
rm ./skills/filechat/vectordb.json
如何回答用户问题(RAG)
查询目标文件夹ID的本地向量存储或Qdrant以获取相关文本片段:
bash
cd ./skills/filechat && node query.js 我的医疗出院证明上写了什么?
使用返回的片段来回答用户。
如何检索和发送物理文件
使用查询脚本找到文件ID,然后下载它:
bash
gws drive files get --params {fileId: , alt: media} --output /workspace/discharge.pdf
使用媒体标签回复:MEDIA:/workspace/discharge.pdf。
如何为用户存储新文件
如果用户上传文件并要求您保存它(或根据您的自动处理规则隐式发送文件):
- 1. 检查他们的文件夹(node folders.js list)。
- 如果他们没有指定文件夹,使用默认文件夹。如果没有设置默认文件夹,请询问他们!
- 通知用户文件将被保存到哪个文件夹。
- 告诉用户您正在提取信息并将其保存到向量数据库中。
- 如果文件是图像或扫描文档,请确保在嵌入之前使用视觉模型或OCR提取文本。(同步脚本原生处理此操作)。
- 使用gws将其上传到正确的文件夹:
bash
gws drive files create --json {name: filename.pdf, parents: [
]} --upload /path/to/uploaded/file.pdf
- 7. 触发node sync.js ,以便向量数据库将文件分块并嵌入到相应的向量数据库中。
如何测试和验证技能
如果用户要求您验证技能是否正常工作,或者您刚刚设置好并希望确保端到端功能,请按照以下确切步骤操作:
- 1. 验证身份验证: 运行npx @googleworkspace/cli auth status。确保显示有效的令牌。
- 验证Drive访问权限: 对目标文件夹进行试运行获取,以确保GWS可以看到文件。
bash
npx @googleworkspace/cli drive files list --params {q: \\ in parents and trashed = false}
(如果失败,请检查文件夹权限或GWS凭据。)
- 3. 强制进行干净同步: 清除测试文件夹的缓存以确保全新运行,然后同步。
bash
rm -f ./skills/filechat/metaID>.json ./skills/filechat/vectordb.json
node ./skills/filechat/sync.js
(您应该看到文件被下载、OCR处理以及片段被嵌入。如果显示0 chunks,请验证文件夹不为空。)
- 4. 测试向量查询: 运行通用查询以验证嵌入已保存且余弦相似度正常工作。
bash
node ./skills/filechat/query.js 你好
(您应该看到带有相似度分数和文本片段的Top matches列表。如果看到,则RAG管道100%可运行!)