caremax-ocr

# CareMax Upload & OCR > **Requires `caremax-auth` as a sibling directory** (`../caremax-auth/`). If missing, tell the user to install caremax-auth first (e.g. `npx skills add KittenYang/caremax-skills`). Upload medical report files (PDF, JPG, PNG, HEIC) and extract structured data via AI-powered OCR. **Session-based workflow**: upload → OCR → review → confirm. All operations are on a single session. **Checkpoint & resume**: Every pipeline step saves progress to the database. If OCR fails mid-way (LLM timeout, worker crash, network error), retrying automatically resumes from the last checkpoint — no work is lost. ## Agent default behavior (MANDATORY) 1. **Upload and OCR are one continuous workflow.** When the user uploads report files (or asks you to upload/扫描/识别体检报告等), after `$UPLOAD` returns successfully you **must in the same turn** run `$OCRSTREAM <session_id>` using the returned `session_id`. **Do not** end the task after `upload.sh` alone. 2. **Upload-only exception:** Skip immediate OCR only if the user **explicitly** asked to upload without recognition (e.g. 只上传、不要识别、别跑 OCR、只存文件). If unclear, default to running OCR after upload. 3. **Progress:** Stream each SSE line to the user as it arrives (normalize / ocr / structure / …). 4. **After `step=done`:** Always continue to Step 3 (review). **Do not** auto-call confirm — wait for user approval before Step 4. ## Prerequisites — Auto-Auth (MANDATORY) ```bash APICALL="bash ../caremax-auth/scripts/api-call.sh" UPLOAD="bash ../caremax-auth/scripts/upload.sh" OCRSTREAM="bash ../caremax-auth/scripts/ocr-stream.sh" ``` If any script returns `no_credentials` → run `bash ../caremax-auth/scripts/auth-flow.sh [base_url]` (from this skill’s root, sibling of `caremax-auth/`). ## Step 1: Upload (creates session) ```bash $UPLOAD /path/to/report1.jpg /path/to/report2.jpg /path/to/report.pdf ``` Returns: ```json { "session_id": "uuid-xxx", "member_id": "uuid-yyy", "files": [ { "id": "file-1", "original_name": "report1.jpg" }, { "id": "file-2", "original_name": "report2.jpg" }, { "id": "file-3", "original_name": "report.pdf" } ] } ``` Save the `session_id`. ## Step 2: OCR with real-time progress ```bash $OCRSTREAM <session_id> ``` Outputs one JSON per line: ```json {"step":"resume","progress":1,"message":"Resuming from checkpoint (last completed: ocr)..."} {"step":"normalize","progress":5,"message":"Loading file 1/3..."} {"step":"ocr","progress":30,"message":"OCR page 2/3: report2.jpg"} {"step":"ocr_retry","progress":35,"message":"Retrying OCR page 1/1: report1.jpg"} {"step":"structure","progress":62,"message":"Detecting report groups..."} {"step":"structure","progress":75,"message":"Structuring report 2/2..."} {"step":"normalize_indicators","progress":88,"message":"Standardizing..."} {"step":"done","progress":100,"data":{"session_id":"...","reports":[...],"resumed":true}} ``` Display progress to the user as each line arrives. ### Key progress events | step | meaning | |------|---------| | `resume` | Pipeline is resuming from a saved checkpoint (not starting from zero) | | `info` | Informational message (e.g. which step was resumed from) | | `normalize` | Loading and preprocessing files | | `ocr` | OCR text extraction per page | | `ocr_retry` | Retrying previously failed pages only | | `structure` | AI analyzing and grouping reports | | `normalize_indicators` | Standardizing indicator names | | `done` | Complete — `data` field contains the full results | | `error` | Pipeline failed — check `message` for details | If `step=resume` appears, tell the user: "正在从上次的进度继续处理（不需要重新开始）" ### Error responses from `$OCRSTREAM` | code | meaning | action | |------|---------|--------| | `processing_in_progress` | Another OCR run is still active | Wait and retry, or poll `/status` | | `ocr_limit_exceeded` | Free OCR quota exhausted | Tell user to upgrade | | (no code) | Pipeline error (LLM timeout etc.) | Retry — will auto-resume from checkpoint | ### Step 2b: Poll status (when SSE disconnects) If the SSE stream disconnects (network timeout, terminal closed), use the status endpoint to check progress: ```bash $APICALL GET "/api/skill/sessions/<session_id>/status" ``` Returns: ```json { "session_id": "uuid", "status": "processing", "pipeline": { "completedStep": "ocr", "pageCount": 5, "ocrCompleted": 4, "ocrFailed": 1, "reportCount": 0, "errors": [{"step":"ocr","pageIndex":2,"message":"PaddleOCR timeout"}] }, "error": null, "is_stale": false } ``` **Field guide:** - `status = processing` + `is_stale = false` → OCR is still running normally - `status = processing` + `is_stale = true` → Worker crashed/timed out, safe to retry OCR - `status = awaiting_confirm` → OCR completed! Fetch session detail for results - `status = uploading` + `error` present → Last OCR attempt failed, retry will resume from checkpoint - `pipeline.completedStep` → How far the pipeline got (normalize → ocr → structure → done) - `pipeline.ocrFailed` → Number of pages that failed OCR (will be retried on next attempt) **Polling workflow:** ``` 1. Call $OCRSTREAM → SSE disconnects mid-way 2. Poll GET /sessions/<id>/status every 5-10 seconds 3. When status = "awaiting_confirm" → fetch full results with GET /sessions/<id> 4. If status = "uploading" (failed) → retry with $OCRSTREAM (auto-resumes) 5. If is_stale = true → retry with $OCRSTREAM (auto-resumes from checkpoint) ``` ## Step 3: Review results (MANDATORY) Parse the `step=done` data. Show formatted summary. **Do NOT auto-confirm.** Each report has a `reportType` field: `lab`, `genetic`, `imaging`, `pathology`, or `other`. ### Lab reports (reportType = "lab") Show indicators table: ``` 📋 报告 1: [lab] 尿生化 (编号: 114431194) 日期: 2025-02-05 医生: 俞海瑾指标: 12 个 (3 个异常) ┌──────────────────────┬────────┬──────────┬────────────┬──────┐ │ 指标 │ 结果 │ 单位 │ 参考范围 │ 异常 │ ├──────────────────────┼────────┼──────────┼────────────┼──────┤ │ 24H尿钠 │ 130.0 │ mmol/24h │ 137-257 │ ⬇ │ └──────────────────────┴────────┴──────────┴────────────┴──────┘ ``` ### Non-lab reports (reportType = "genetic" / "imaging" / etc.) Show summary + sections: ``` 📋 报告 1: [genetic] 基因检测报告日期: 2025-09-12 检测机构: 南京申友医学检验所摘要: 心血管18项基因检测...高血压、冠心病风险一般... 段落: 18 sections [gene_variant] 高血压 — 风险: 正常 [gene_variant] 冠心病 — 风险: 一般 [medication] ACEI类降压药 — 正常代谢型 ... ``` ### Supported file types - **Images** (JPG/PNG/HEIC): PaddleOCR → structure - **PDF** (any size): Azure Mistral Document AI page-split → structure - Large PDFs (e.g. 23-page gene report, 9.6MB) are fully supported ## Step 4: Confirm and save After user confirms: ```bash $APICALL POST "/api/skill/sessions/<session_id>/confirm" '{"reports":[<reports from step 2>]}' ``` Returns: `{"success":true,"message":"2 report(s) saved","recordIds":[...]}` ## Resuming incomplete sessions When the user asks to continue/resume a previous upload, or when checking for unfinished work: ### Step A: Find pending sessions ```bash # List sessions that need OCR (uploaded but not processed) $APICALL GET "/api/skill/sessions?status=uploading" # List sessions stuck in processing (user exited mid-OCR) $APICALL GET "/api/skill/sessions?status=processing" # List sessions with OCR done but not yet confirmed $APICALL GET "/api/skill/sessions?status=awaiting_confirm" ``` Show a summary of pending sessions to the user (file names, dates, status). ### Step B: Resume based on status - **`uploading`**: Start OCR directly → go to Step 2 (`$OCRSTREAM <session_id>`) - If there's a saved checkpoint (previous failed attempt), OCR auto-resumes from it - **`processing`**: Check with status endpoint first: ```bash $APICALL GET "/api/skill/sessions/<session_id>/status" ``` - `is_stale = false` → still running, wait or poll - `is_stale = true` → worker died, safe to retry: `$OCRSTREAM <session_id>` (auto-resumes from checkpoint) - **`awaiting_confirm`**: Get session detail → show results → go to Step 3 (review & confirm) ```bash # Get full detail of a pending session (includes OCR results if awaiting_confirm) $APICALL GET "/api/skill/sessions/<session_id>" ``` If the session is `awaiting_confirm`, the response includes `ocr_result` with the previously parsed reports — display them for review and proceed to Step 3 (confirm). ### Resume-aware response handling When `$OCRSTREAM` outputs `step=done`: - `resumed = true` in the data → tell user: "已从上次的进度恢复，OCR 结果已就绪" - `resumed = false` (or absent) → normal fresh run When `$OCRSTREAM` outputs `step=error`: - `code = processing_in_progress` → tell user OCR is still running, poll `/status` instead - `code = ocr_limit_exceeded` → tell user to upgrade - No code → LLM/network error, safe to retry (will auto-resume from checkpoint) ### Step C: Delete individual reports or stale sessions Delete a single report (does NOT affect other reports in the same session): ```bash $APICALL DELETE "/api/skill/sessions/<session_id>/records/<record_id>" ``` Delete an entire session (cascade deletes ALL files + reports): ```bash $APICALL DELETE "/api/skill/sessions/<session_id>" ``` ## Other session operations ```bash # List all sessions (all statuses) $APICALL GET /api/skill/sessions # List sessions filtered by status: uploading | processing | awaiting_confirm | completed $APICALL GET "/api/skill/sessions?status=<status>" # Get session detail (includes OCR results if awaiting_confirm, saved reports if completed) $APICALL GET "/api/skill/sessions/<session_id>" # Poll OCR progress (lightweight, use when SSE disconnects) $APICALL GET "/api/skill/sessions/<session_id>/status" # Delete single report (keeps session and other reports intact) $APICALL DELETE "/api/skill/sessions/<session_id>/records/<record_id>" # Delete entire session (undo everything: files + reports) $APICALL DELETE "/api/skill/sessions/<session_id>" ```

caremax-ocr

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

caremax-ocr

caremax-ocr

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement