Agent PaddleOCR Vision
OCR with Agent Actions — powered by PaddleOCR only. Automatically classifies documents and provides actionable prompts.
What It Does
- - OCR extraction via PaddleOCR cloud API (requires credentials)
- 11 document types: invoice, business card, receipt, table, contract, ID card, passport, bank statement, driver's license, tax form, general
- Action suggestion with structured parameters
- Batch processing
- Searchable PDF generation (with bbox alignment)
Quick Start
CODEBLOCK0
Batch
CODEBLOCK1
Output
See docs/README.zh.md for full JSON schema and integration guide.
Supported Types
| Type | Actions |
|---|
| Invoice | createexpense, archive, taxreport |
| Business Card |
add
contact, savevcard |
| Receipt | create
expense, splitbill |
| Table | export
csv, analyzedata |
| Contract | summarize, extract
dates, flagobligations |
| ID Card | extract
idinfo, verify_age |
| Passport | store
passportinfo, check_validity |
| Bank Statement | categorize
transactions, generatereport |
| Driver License | store
licenseinfo, check_expiry |
| Tax Form | summarize
tax, suggestdeductions |
| General | summarize, translate, search_keywords |
Configuration
Required environment variables:
- -
PADDLEOCR_DOC_PARSING_API_URL — API endpoint ending in INLINECODE2 - INLINECODE3 — Access token
Optional:
- -
PADDLEOCR_DOC_PARSING_TIMEOUT — Default 600 seconds
Searchable PDF
With --make-searchable-pdf, embeds OCR text layer aligned to original layout using bounding boxes. Requires pdf2image + poppler (system) and reportlab, pypdf, pillow (Python).
Full Documentation
Detailed usage, troubleshooting, and development guide available in multiple languages under docs/:
- - 中文: INLINECODE12
- English: INLINECODE13
- Español: INLINECODE14
- العربية: INLINECODE15
License
MIT-0
Made for OpenClaw. Let your agent see and act.
Agent PaddleOCR Vision
具备智能体操作的OCR——仅基于PaddleOCR实现。 自动分类文档并提供可操作提示。
功能概述
- - 通过 PaddleOCR云API 进行OCR提取(需配置凭证)
- 支持11种文档类型:发票、名片、收据、表格、合同、身份证、护照、银行对账单、驾驶证、税务表单、通用文档
- 提供带结构化参数的操作建议
- 支持批量处理
- 可生成可搜索PDF(带边界框对齐)
快速开始
bash
安装依赖
pip3 install -r scripts/requirements.txt
配置PaddleOCR API
export PADDLEOCR
DOCPARSING
APIURL=https://your-api.paddleocr.com/layout-parsing
export PADDLEOCR
ACCESSTOKEN=your_token
处理单个文件
python3 scripts/doc_vision.py --file-path ./invoice.jpg --pretty --make-searchable-pdf
批量处理
bash
python3 scripts/doc_vision.py --batch-dir ./inbox --output-dir ./out
输出说明
完整JSON模式及集成指南请参见 docs/README.zh.md。
支持类型
添加联系人、保存vCard |
| 收据 | 创建支出、分摊账单 |
| 表格 | 导出CSV、分析数据 |
| 合同 | 摘要、提取日期、标记义务 |
| 身份证 | 提取身份信息、验证年龄 |
| 护照 | 存储护照信息、检查有效期 |
| 银行对账单 | 分类交易、生成报告 |
| 驾驶证 | 存储驾照信息、检查有效期 |
| 税务表单 | 税务摘要、建议抵扣项 |
| 通用文档 | 摘要、翻译、关键词搜索 |
配置说明
必需的环境变量:
- - PADDLEOCRDOCPARSINGAPIURL — 以 /layout-parsing 结尾的API端点
- PADDLEOCRACCESSTOKEN — 访问令牌
可选配置:
- - PADDLEOCRDOCPARSING_TIMEOUT — 默认600秒
可搜索PDF
使用 --make-searchable-pdf 参数时,通过边界框将OCR文本层嵌入并与原始布局对齐。需要 pdf2image + poppler(系统级)以及 reportlab、pypdf、pillow(Python库)。
完整文档
详细用法、故障排除及开发指南以多语言版本存放于 docs/ 目录:
- - 中文: docs/README.zh.md
- English: docs/README.en.md
- Español: docs/README.es.md
- العربية: docs/README.ar.md
许可证
MIT-0
专为OpenClaw打造。 让您的智能体看得见、能行动。