Doc-Scan — DEPRECATED
This skill is deprecated. All functionality has been merged into doc-process v4.0.0 with a significantly improved scanner engine. Please use doc-process instead.
To scan a document photo: install doc-process and say "scan this photo", "correct perspective", "dewarp this document", or any equivalent phrase.
Doc-Scan — Document Scanner Skill (archived)
Converts a photo of a document (whiteboard, printed page, handwritten note, form, receipt, book page, etc.) into a clean scanned-looking image with perspective correction and enhancement.
Step 1 — Validate the Input
Read the provided image visually. Assess:
| Check | Yes/No | Notes |
|---|
| Is this an image file? | | .jpg, .jpeg, .png, .heic, .webp, .bmp, .tiff |
| Does the image contain a document? |
| Printed page, form, note, receipt, whiteboard, book |
| Is the document the primary subject? | | Centered or dominant in frame |
| Is there a perspective distortion? | | Taken from an angle — not flat/overhead |
| Is the image quality sufficient? | | Not severely blurred or too dark |
Non-Document Detection
If the image does not appear to contain a document, respond:
CODEBLOCK0
Do not proceed with scanning if this check fails.
Step 2 — Pre-Scan Assessment
Report what you see before scanning:
CODEBLOCK1
Step 3 — Run the Scanner Script
CODEBLOCK2
Common Options
# Black and white output (best for text documents)
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.png --mode bw
# Color-preserved output (best for forms, diagrams, colored content)
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.png --mode color
# Grayscale output (middle ground)
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.png --mode gray
# Output as PDF
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.pdf --format pdf
# Multiple images into one PDF (multi-page scan)
python skills/doc-scan/scripts/doc_scanner.py --input page1.jpg page2.jpg page3.jpg --output document.pdf --format pdf
# Manual corner specification (if auto-detection fails)
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.png --corners "50,30 800,20 820,1100 40,1120"
# High-resolution output
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.png --dpi 300
# Skip perspective correction (if photo is already flat)
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.png --no-warp
Step 4 — Interpret Script Output
The script outputs a JSON status block to stderr. Parse and report to user:
CODEBLOCK4
Status Handling
"status": "success": Report completion with key stats.
"corners_detected": false: Auto-detection failed. Offer:
- - "Auto edge-detection could not find the document corners. I can try with manual corner hints — please describe approximately where the four corners of the document appear in the photo (e.g., top-left at about 10% from left and 5% from top)."
- Or: offer
--no-warp mode to at least apply enhancement without perspective correction
warnings array: Report any warnings to user — e.g., "Low contrast image", "Detected significant blur", "Partial document visible"
Step 5 — Post-Scan Quality Check
After the script completes, read the output image visually and assess:
| Quality Check | Pass / Fail | Notes |
|---|
| Document edges are straight | | No barrel distortion remaining |
| Text is legible |
| Not blurred or over-enhanced |
| Shadows removed or reduced | | Even lighting across page |
| Background removed (white/clean) | | No table/desk visible |
| Correct aspect ratio (A4/Letter) | | Not stretched or squished |
| Color / binarization correct | | B&W if text-only, Color if content requires it |
If any check fails, report the issue and offer:
- - Re-run with different settings (different mode, manual corners, contrast level)
- Re-photograph tips (see Step 7)
Step 6 — Output Report
CODEBLOCK5
Step 7 — Multi-Page Documents
If the user provides multiple photos (or a folder of images):
- 1. Process each image individually
- Sort by filename or user-specified order
- Combine into a single PDF:
python skills/doc-scan/scripts/doc_scanner.py \
--input page1.jpg page2.jpg page3.jpg \
--output document.pdf --format pdf
- 4. Report: "3-page document scanned and saved to document.pdf"
For very large batches (10+ pages), process in batches of 10 and combine.
Step 8 — Photography Tips (on poor quality input)
If the input image quality is poor or auto-detection fails, provide tips:
CODEBLOCK7
Step 9 — Integration with Doc-Process
After a successful scan, offer to process the scanned document with doc-process:
CODEBLOCK8
If the user confirms, pass the scanned output file to the appropriate doc-process mode.
Step 10 — Timeline Logging
After a successful scan, silently log to the timeline:
python skills/doc-process/scripts/timeline_manager.py add \
--type "Doc Scan" \
--source "<input filename>" \
--summary "<N> page(s) scanned, perspective corrected, saved as <output filename>"
General Rules
- - Never process the photo if it does not contain a document — explain what was detected instead
- Always report the detected document type so the user can confirm before scanning
- Auto-detect corners when possible; fall back gracefully to manual or no-warp mode
- Default output mode:
bw for text documents, color for anything with color content - Default output format: PNG (lossless); PDF only if explicitly requested or for multi-page
- Default DPI: 300 (print quality); 150 for screen-only use
Doc-Scan — 已弃用
此技能已弃用。 所有功能已合并至 doc-process v4.0.0,并配备了显著改进的扫描引擎。请改用 doc-process。
如需扫描文档照片:安装 doc-process 并说出扫描这张照片、校正透视、矫正文档或任何等效指令。
Doc-Scan — 文档扫描技能(已归档)
将文档照片(白板、打印页、手写笔记、表格、收据、书页等)转换为具有透视校正和增强效果的干净扫描图像。
第1步 — 验证输入
通过视觉方式读取提供的图像。评估:
| 检查项 | 是/否 | 备注 |
|---|
| 是否为图像文件? | | .jpg、.jpeg、.png、.heic、.webp、.bmp、.tiff |
| 图像中是否包含文档? |
| 打印页、表格、笔记、收据、白板、书籍 |
| 文档是否为主要主体? | | 居中或在画面中占主导地位 |
| 是否存在透视畸变? | | 从一定角度拍摄——非平视/俯视 |
| 图像质量是否足够? | | 未严重模糊或过暗 |
非文档检测
如果图像似乎不包含文档,请回复:
⚠ 此图像似乎不包含文档。
我检测到:[图像内容的简要描述——例如风景照片、人物肖像、空白墙壁]
Doc-Scan 最适合处理:
- - 打印文档(表格、信件、报告)
- 手写笔记或白板
- 收据、发票或名片
- 书籍或杂志页面
- 从上方或一定角度拍摄的任何平面文档
如果您本意是上传文档照片,请尝试在光线更好的条件下重新拍摄,并确保文档清晰可见。如果您想为此图像进行其他处理,我也可以提供帮助。
如果此项检查未通过,请勿继续扫描。
第2步 — 扫描前评估
在扫描前报告您所看到的内容:
文档照片评估
| 属性 | 检测值 |
|---|
| 文档类型 | [例如:打印信件、手写笔记、收据、表格] |
| 方向 |
纵向 / 横向 / 倾斜(约N度) |
| 透视畸变 | 无 / 轻微 / 中等 / 严重 |
| 光照 | 均匀 / 不均匀([区域]有阴影)/ 过暗 / 过亮 |
| 背景 | 白色桌面 / 深色桌子 / 复杂背景 |
| 图像质量 | 清晰 / 轻微模糊 / 模糊 |
| 估计文档面积 | 约占图像总面积的N% |
| 多页? | 单页 / 检测到[N]页 |
| 可见内容 | [简要描述——例如文本文档,3列,似乎是表格] |
推荐增强处理:
- - [x] 透视校正
- [x] 背景移除 / 边缘裁剪
- [ ] 二值化(黑白)—— 适用于纯文本
- [x] 对比度增强
- [x] 阴影去除
- [ ] 色彩保留 —— 适用于包含彩色内容的文档
第3步 — 运行扫描脚本
bash
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.png
常用选项
bash
黑白输出(最适合文本文档)
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.png --mode bw
保留色彩输出(最适合表格、图表、彩色内容)
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.png --mode color
灰度输出(折中方案)
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.png --mode gray
输出为PDF
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.pdf --format pdf
多张图像合并为一个PDF(多页扫描)
python skills/doc-scan/scripts/doc_scanner.py --input page1.jpg page2.jpg page3.jpg --output document.pdf --format pdf
手动指定角点(如果自动检测失败)
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.png --corners 50,30 800,20 820,1100 40,1120
高分辨率输出
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.png --dpi 300
跳过透视校正(如果照片已平直)
python skills/doc-scan/scripts/doc_scanner.py --input photo.jpg --output scanned.png --no-warp
第4步 — 解读脚本输出
脚本向 stderr 输出一个 JSON 状态块。解析并向用户报告:
json
{
status: success,
corners_detected: true,
corners: [[50,30],[800,20],[820,1100],[40,1120]],
warp_applied: true,
enhancement_mode: bw,
input_size: [3024, 4032],
output_size: [2480, 3508],
output_dpi: 300,
pages: 1,
output_file: scanned.png,
warnings: []
}
状态处理
status: success:报告完成情况并附上关键统计数据。
corners_detected: false:自动检测失败。提供以下选项:
- - 自动边缘检测无法找到文档角点。我可以尝试使用手动角点提示——请大致描述文档四个角在照片中的位置(例如,左上角距左侧约10%、距顶部约5%)。
- 或者:提供 --no-warp 模式,至少在不进行透视校正的情况下应用增强处理
warnings 数组:向用户报告任何警告——例如低对比度图像、检测到明显模糊、文档部分可见
第5步 — 扫描后质量检查
脚本完成后,通过视觉方式读取输出图像并评估:
| 质量检查项 | 通过/未通过 | 备注 |
|---|
| 文档边缘平直 | | 无残留桶形畸变 |
| 文字清晰可读 |
| 未模糊或过度增强 |
| 阴影已去除或减少 | | 页面光照均匀 |
| 背景已移除(白色/干净) | | 无可见桌面 |
| 宽高比正确(A4/信纸) | | 未拉伸或挤压 |
| 色彩/二值化正确 | | 纯文本用黑白,有彩色内容则保留色彩 |
如果任何检查项未通过,请报告问题并提供:
- - 使用不同设置重新运行(不同模式、手动角点、对比度级别)
- 重新拍摄建议(参见第7步)
第6步 — 输出报告
扫描完成 ✓
2480 × 3508 像素(A4,300 DPI) |
| 模式 | 黑白 |
| 透视校正 | 已应用 |
| 阴影去除 | 已应用 |
| 处理时间 | ~2.3秒 |
已应用的增强处理
- - 边缘检测与四角提取
- 透视变形至标准A4尺寸
- 自适应阈值处理(Sauvola方法)实现干净黑白文字
- 通过背景归一化进行阴影补偿
- 裁剪边框至文档边缘
处理前 → 处理后
[原始照片] → [扫描输出]
(两者均可通过其文件路径获取)
第7步 — 多页文档
如果用户提供多张照片(或一个图像文件夹):
- 1. 单独处理每张图像
- 按文件名或用户指定的顺序排序
- 合并为单个PDF:
bash
python skills/doc-scan/scripts/doc_scanner.py \
--input page1.jpg page2.jpg page3.jpg \
--output document.pdf --format pdf
- 4. 报告:3页文档已扫描并保存至 document.pdf
对于非常大的批次(10页以上),按每批10页处理并合并。
第8步 — 拍摄技巧(针对低质量输入)
如果输入图像质量较差或自动检测失败,请提供技巧:
获得更好扫描效果的技巧
光照:
- - 在明亮均匀的光线下扫描(避免直射阳光产生眩光)
- 避免手或身体投下阴影
- 光线充足的室内环境效果最佳
相机位置:
- - 尽可能将相机直接置于文档上方
- 保持相机与文档表面平行
- 完整文档应可见,并留有少量边框
背景:
- - 将文档放在对比鲜明的背景上(白纸用深色桌面,深色纸张用白色表面)
- 避免有图案或杂