PDF to OFD High-Fidelity Converter

🎯 Purpose

A specialized skill for converting PDF documents into the Chinese National Standard OFD (GB/T 33190-2016) format. Optimized for Electronic Invoices (OFD版式发票) with advanced rendering capabilities that exceed standard conversion libraries.

✨ Key Features

- High-Fidelity Text Placement: Uses character-level positioning (DeltaX arrays) and baseline origin data extracted via rawdict to ensure text layout is 100% identical to the source PDF.
Advanced Vector Graphics: Directly extracts original stroke colors, fill colors, and line widths. Supports complex path types and fill instructions.
Transparency Preservation: Fully supports Alpha and FillOpacity for vector paths and SMask transparency for images (e.g., electronic seals and signatures).
Cross-Platform Font Mapping: Intelligent mapping of macOS-specific (STSong, STKaiti) and Windows-specific font names to standardized OFD font names (宋体, 楷体, 黑体).
In-Memory Packaging: Generates the final OFD zip structure entirely in memory to avoid temporary file clutter and ensure security.
Color Snapping: Heuristic "Invoice Red" correction (128 0 0) for financial documents while preserving non-standard colors.

🛠️ Usage Instructions

When a user asks to convert a PDF or a "High-Fidelity" invoice to OFD:

1. Direct Execution:

CODEBLOCK0

2. Plugin Integration:

The script implements a PDF2OFDConverter class that can be easily imported and used in other Python workflows.

Example Output

CODEBLOCK1

📦 Requirements

Dependencies required in the environment:

- PyMuPDF (fitz): For advanced PDF parsing and raw character data extraction.
INLINECODE8: For image processing and transparency handling.
INLINECODE9: The base library for OFD structure (extended via internal monkey patches).
INLINECODE10: For XML manipulation.

💡 Notes

- This skill uses deep monkey-patching on easyofd to fix known library limitations regarding character positioning and resource ID tracking.
The conversion process assumes standard Chinese fonts (SimSun, KaiTi, SimHei) are available on the viewing system.
Zero-copy resource handling: Images are extracted and re-compressed as PNG/JPG only when necessary to preserve quality.

PDF到OFD高保真转换器

🎯 目的

一个专门用于将PDF文档转换为中国国家标准OFD（GB/T 33090-2016）格式的技能。针对电子发票（OFD版式发票）进行了优化，具备超越标准转换库的高级渲染能力。

✨ 主要特性

- 高保真文本定位：使用基于字符级别的定位（DeltaX数组）和通过rawdict提取的基线原点数据，确保文本布局与源PDF完全一致。
高级矢量图形：直接提取原始描边颜色、填充颜色和线宽。支持复杂的路径类型和填充指令。
透明度保留：完全支持矢量路径的Alpha和FillOpacity属性，以及图像的SMask透明度（例如电子印章和签名）。
跨平台字体映射：智能映射macOS特有字体（STSong、STKaiti）和Windows特有字体名称到标准OFD字体名称（宋体、楷体、黑体）。
内存打包：完全在内存中生成最终的OFD压缩包结构，避免临时文件混乱并确保安全性。
颜色捕捉：针对财务文档的启发式“发票红”校正（128 0 0），同时保留非标准颜色。

🛠️ 使用说明

当用户要求将PDF或“高保真”发票转换为OFD时：

1. 直接执行：

bash python3 pdf2ofd.py <输入路径.pdf> [输出路径.ofd]

2. 插件集成：

该脚本实现了一个PDF2OFDConverter类，可以轻松导入并在其他Python工作流中使用。

输出示例

text 成功：/路径/到/发票.ofd

📦 依赖要求

环境中所需的依赖项：

- PyMuPDF (fitz)：用于高级PDF解析和原始字符数据提取。
Pillow：用于图像处理和透明度处理。
easyofd：OFD结构的基础库（通过内部猴子补丁进行扩展）。
xmltodict：用于XML操作。

💡 注意事项

- 此技能对easyofd进行了深度猴子补丁，以修复已知的库在字符定位和资源ID跟踪方面的限制。
转换过程假设查看系统上存在标准中文字体（SimSun、KaiTi、SimHei）。
零拷贝资源处理：仅在必要时将图像提取并重新压缩为PNG/JPG，以保持质量。

pdf2ofdPDF转OFD

pdf2ofd

PDF to OFD High-Fidelity Converter

🎯 Purpose

✨ Key Features

🛠️ Usage Instructions

Example Output

📦 Requirements

💡 Notes

PDF到OFD高保真转换器

🎯 目的

✨ 主要特性

🛠️ 使用说明

输出示例

📦 依赖要求

💡 注意事项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

pdf2ofdPDF转OFD

pdf2ofd

PDF to OFD High-Fidelity Converter

🎯 Purpose

✨ Key Features

🛠️ Usage Instructions

Example Output

📦 Requirements

💡 Notes

PDF到OFD高保真转换器

🎯 目的

✨ 主要特性

🛠️ 使用说明

输出示例

📦 依赖要求

💡 注意事项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement