Super OCR

Overview

Super OCR is a production-grade optical character recognition tool that intelligently selects the best engine for your needs:

- Tesseract Engine: Lightweight, fast (~200-500ms), perfect for simple text extraction
PaddleOCR Engine: High accuracy (98%+), optimized for Chinese, ideal for complex documents

Engine Selection Strategy

Auto Mode (Default)

The skill automatically selects the optimal engine:

Scenario	Selected Engine	Why
Simple text, English only	Tesseract	Faster, lighter dependency
Chinese content, high accuracy needed

Force Mode

Users can explicitly choose an engine:

- --engine tesseract - Use Tesseract only
INLINECODE1 - Use PaddleOCR only
INLINECODE2 - Auto-select (default)

Quick Start

Installation

This skill requires the following dependencies:

- PaddleOCR (for Chinese text recognition - 98%+ accuracy)
Tesseract (for fast English text recognition)
OpenCV (for image preprocessing)

Option 1: Install with pip (all-in-one)

CODEBLOCK0

Option 2: Install dependencies manually

macOS:
CODEBLOCK1

Ubuntu/Debian:
CODEBLOCK2

Windows:
CODEBLOCK3

Usage

CODEBLOCK4

Structuring This Skill

This skill uses a capabilities-based structure with multiple execution modes:

1. Engine Selection Logic - Intelligent decision making
OCR Execution - Unified interface for different engines
Post-processing - Standardized output formatting
Validation & Fallback - Quality assurance

Core Capabilities

1. Intelligent Engine Selection

The skill includes a decision tree that analyzes:

- Image characteristics (contrast, text size)
Language patterns (Chinese character detection)
User requirements (speed vs accuracy)

See scripts/engine_selector.py for implementation details.

2. Dual Engine Support

Tesseract Engine (scripts/tesseract_ocr.py):

- Fast preprocessing pipeline
PSM mode 6 for uniform text blocks
Confidence scoring per word
Language detection

PaddleOCR Engine (scripts/paddle_ocr.py):

- State-of-art? SN (East text detection)
Crnn recognition with LSTM
Confidence scores per character
Table detection support

3. Output Formats

Supports multiple output formats:

Format	Content	Use Case
Text only	Clean extracted text	Simple search/grep
Structured

4. Quality Guarantees

- Confidence thresholds (configurable, default 80%)
Low-confidence alerts for manual review
\Fallback processing for failed OCRs

Resources

scripts/

- main.py - Main entry point, CLI interface (supports multi-engine)
INLINECODE7 - Auto-install and validation
INLINECODE8 - Multiple output format support
INLINECODE9 - OCR engine implementations

- selector.py - Intelligent engine selection logic - tesseract.py - Tesseract engine wrapper - paddle.py - PaddleOCR engine wrapper - macvision.py - macOS Vision OCR (macOS only)

- preprocessing/ - Image preprocessing utilities

- preprocessor.py - Denoising, enhancement, binarization

dependencies.py (Key Feature)

The dependencies.py module handles:

- Dependency detection (paddleocr, paddlepaddle, pytesseract, cv2)
Auto-install on missing dependencies
version checking
OS-specific installation commands
Clear error messages with troubleshooting steps

Use this when setting up a new environment with INLINECODE21

Advanced Features

Custom Configuration

Create config.yaml for persistent settings:

CODEBLOCK5

Batch Processing

Process multiple images:

CODEBLOCK6

API Mode

Use as a Python library:

CODEBLOCK7

Anti-Patterns

- ❌ Using PaddleOCR for every image (overhead for simple cases)
❌ ignoring confidence scores (quality matters)
❌ Biases (always prefering one engine)
❌ Skipping preprocessing (quality impact)

Performance Notes

Engine	Init Time	Per-Image	Memory	Best For
Tesseract	~200ms	~50ms	~100MB	Quick extraction
PaddleOCR

~3s | ~500ms | ~500MB | High accuracy |

Initialize once, reuse processor for batch processing.

Super OCR

概述

Super OCR 是一款生产级光学字符识别工具，能够智能选择最适合您需求的引擎：

- Tesseract 引擎：轻量、快速（约200-500ms），适合简单文本提取
PaddleOCR 引擎：高精度（98%以上），针对中文优化，适合复杂文档

引擎选择策略

自动模式（默认）

该技能自动选择最优引擎：

场景	选择引擎	原因
简单文本，仅英文	Tesseract	更快，依赖更轻量
中文内容，需要高精度

强制模式

用户可以明确选择引擎：

- --engine tesseract - 仅使用 Tesseract
--engine paddle - 仅使用 PaddleOCR
--engine auto - 自动选择（默认）

快速开始

安装

此技能需要以下依赖：

- PaddleOCR（用于中文文本识别 - 98%以上精度）
Tesseract（用于快速英文文本识别）
OpenCV（用于图像预处理）

选项 1：使用 pip 安装（一体化）

bash pip install paddleocr paddlepaddle pytesseract pillow opencv-python numpy

选项 2：手动安装依赖

macOS：
bash

Tesseract

brew install tesseract

PaddleOCR

pip install paddleocr paddlepaddle

Ubuntu/Debian：
bash

Tesseract

sudo apt update && sudo apt install tesseract-ocr

PaddleOCR

pip install paddleocr paddlepaddle

Windows：
bash

从 https://github.com/UB-Mannheim/tesseract/wiki 下载 Tesseract

pip install paddleocr paddlepaddle pytesseract pillow opencv-python numpy

使用方法

bash

自动模式（推荐）- 运行所有可用引擎

cd path/to/super-ocr
python scripts/main.py --image path/to/image.png

强制仅使用 Tesseract

python scripts/main.py --image document.jpg --engine tesseract

强制使用 PaddleOCR（高精度中文）

python scripts/main.py --image chinese_menu.png --engine paddle

运行所有引擎（仅 macOS：Tesseract + PaddleOCR + MacVision）

python scripts/main.py --image complex_doc.png --engine all

批量处理并指定输出目录

python scripts/main.py --images ./images/*.png --output ./results --verbose

检查依赖并自动安装

python scripts/dependencies.py --check --install

技能结构

此技能采用基于能力的结构，支持多种执行模式：

1. 引擎选择逻辑 - 智能决策
OCR 执行 - 不同引擎的统一接口
后处理 - 标准化输出格式
验证与回退 - 质量保证

核心能力

1. 智能引擎选择

该技能包含一个决策树，分析以下内容：

- 图像特征（对比度、文本大小）
语言模式（中文字符检测）
用户需求（速度与精度）

详见 scripts/engine_selector.py 实现。

2. 双引擎支持

Tesseract 引擎（scripts/tesseract_ocr.py）：

- 快速预处理流程
PSM 模式 6 用于统一文本块
每个单词的置信度评分
语言检测

PaddleOCR 引擎（scripts/paddle_ocr.py）：

- 最先进的 EAST 文本检测
带 LSTM 的 CRNN 识别
每个字符的置信度评分
表格检测支持

3. 输出格式

支持多种输出格式：

格式	内容	使用场景
纯文本	干净的提取文本	简单搜索/文本处理
结构化

4. 质量保证

- 置信度阈值（可配置，默认80%）
低置信度警报，提示人工审核
OCR 失败时的回退处理

资源

scripts/

- main.py - 主入口，CLI 接口（支持多引擎）
dependencies.py - 自动安装和验证
output_formatter.py - 多种输出格式支持
engine/ - OCR 引擎实现

- selector.py - 智能引擎选择逻辑 - tesseract.py - Tesseract 引擎封装 - paddle.py - PaddleOCR 引擎封装 - macvision.py - macOS Vision OCR（仅 macOS）

- preprocessing/ - 图像预处理工具

- preprocessor.py - 去噪、增强、二值化

dependencies.py（关键功能）

dependencies.py 模块处理：

- 依赖检测（paddleocr、paddlepaddle、pytesseract、cv2）
缺失依赖自动安装
版本检查
操作系统特定安装命令
清晰的错误信息和故障排除步骤

在新环境中使用 python scripts/dependencies.py --check --install 进行设置

高级功能

自定义配置

创建 config.yaml 进行持久化设置：

yaml
default_engine: auto
confidence_threshold: 0.8
output_format: json
preprocess:
denoise: true
enhance_contrast: true

批量处理

处理多个图像：

bash
python scripts/ocr.py --images ./images/*.png --output ./results

API 模式

作为 Python 库使用：

python
from super_ocr import OCRProcessor

processor = OCRProcessor(engine=auto)
result = processor.extract(image.png)
print(result.text)
print(result.confidence)

反模式

- ❌ 对每个图像都使用 PaddleOCR（简单场景下开销过大）
❌ 忽略置信度评分（质量很重要）
❌ 偏见（总是偏好某个引擎）
❌ 跳过预处理（影响质量）

性能说明

引擎	初始化时间	每张图像	内存	最佳用途
Tesseract	~200ms	~50ms	~100MB	快速提取
PaddleOCR

~3s | ~500ms | ~500MB | 高精度 |

初始化一次，在批量处理中复用处理器。

super-ocr超级OCR

super-ocr

Super OCR

Overview

Engine Selection Strategy

Auto Mode (Default)

Force Mode

Quick Start

Installation

Option 1: Install with pip (all-in-one)

Option 2: Install dependencies manually

Usage

Structuring This Skill

Core Capabilities

1. Intelligent Engine Selection

2. Dual Engine Support

3. Output Formats

4. Quality Guarantees

Resources

scripts/

dependencies.py (Key Feature)

Advanced Features

Custom Configuration

Batch Processing

API Mode

Anti-Patterns

Performance Notes

Super OCR

概述

引擎选择策略

自动模式（默认）

强制模式

快速开始

安装

选项 1：使用 pip 安装（一体化）

选项 2：手动安装依赖

Tesseract

PaddleOCR

Tesseract

PaddleOCR

从 https://github.com/UB-Mannheim/tesseract/wiki 下载 Tesseract

使用方法

自动模式（推荐）- 运行所有可用引擎

强制仅使用 Tesseract

强制使用 PaddleOCR（高精度中文）

运行所有引擎（仅 macOS：Tesseract + PaddleOCR + MacVision）

批量处理并指定输出目录

检查依赖并自动安装

技能结构

核心能力

1. 智能引擎选择

2. 双引擎支持

3. 输出格式

4. 质量保证

资源

scripts/

dependencies.py（关键功能）

高级功能

自定义配置

批量处理

API 模式

反模式

性能说明

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement