ComPDF Conversion CLI Skill
Purpose
- - Wraps the
ComPDFKitConversion Python SDK into a reusable local conversion workflow, supporting PDF / image to Word, PPT, Excel, HTML, RTF, Image, TXT, JSON, Markdown, and CSV (10 output formats in total).
Agent Skills Standard Compatibility
- - This Skill uses an Anthropic Agent Skills-compatible directory structure:
compdf-conversion-cli/. - The entry point is
SKILL.md; helper scripts are placed in scripts/. - The document uses
$ARGUMENTS and ${CLAUDE_SKILL_DIR} conventions for distribution and execution in Claude Code / Agent Skills-compatible environments.
Input / Output
- - Input: The target format (
word/excel/ppt/html/rtf/image/txt/json/markdown/csv), the PDF or image path, and the output path are passed via Skill arguments or the command line. An optional PDF password and conversion parameters may also be provided. - Supported input file types:
- PDF files (
.pdf)
- Image files (
.jpg/
.jpeg/
.png/
.bmp/
.tif/
.tiff/
.webp/
.jp2/
.gif/
.tga)
- - Output: A file in the corresponding format (
.docx, .pptx, .xlsx, .html, .rtf, image, .txt, .json, .md, .csv), or a clear error message.
Prerequisites
- - Supports Windows and macOS.
- The conversion SDK must be installed first:
pip install ComPDFKitConversion
- - On first run, the script automatically downloads
license.xml from the ComPDF server and caches it in the scripts/ directory:
https://download.compdf.com/skills/license/license.xml
- - The script reads the
<key>...</key> field from license.xml and uses that key for LibraryManager.license_verify(...) authentication — it does not pass the XML file path directly to the SDK. - To use a custom license, place your own
license.xml in the scripts/ directory; the script will use it directly without downloading. - During SDK initialization, the
resource directory is always set to the directory containing compdf_conversion_cli.py, i.e., the scripts/ directory itself. - When
--enable-ocr or --enable-ai-layout (enabled by default) is used, the Skill also requires scripts/documentai.model. If the file does not exist, the script will automatically download it from:
https://download.compdf.com/skills/model/documentai.model
- - To reuse an existing model file, you can override the default model path via an environment variable:
CODEBLOCK3
Workflow
- 1. Confirm the Python package is installed:
python -m pip show ComPDFKitConversion
- 2. The script automatically downloads
license.xml on first run; the scripts/ directory is used directly as the SDK resource path. - In Agent Skills / Claude Code environments, prefer using the Skill's built-in script path variable:
python "${CLAUDE_SKILL_DIR}/scripts/compdf_conversion_cli.py" word input.pdf output.docx
python "${CLAUDE_SKILL_DIR}/scripts/compdf_conversion_cli.py" ppt input.pdf output.pptx
python "${CLAUDE_SKILL_DIR}/scripts/compdf_conversion_cli.py" excel input.pdf output.xlsx
- 4. For more control, append common parameters:
python "${CLAUDE_SKILL_DIR}/scripts/compdf_conversion_cli.py" excel input.pdf output.xlsx --page-ranges "1-3,5" --excel-all-content --excel-worksheet-option for-page
python "${CLAUDE_SKILL_DIR}/scripts/compdf_conversion_cli.py" word input.pdf output.docx --enable-ocr --page-layout-mode flow
- 5. On startup, the script ensures
scripts/license.xml exists (downloading it automatically from the ComPDF server if missing), reads the <key> field for SDK authentication, and uses the scripts/ directory as the resource path. - If
--enable-ocr or --enable-ai-layout (enabled by default) is active, the script checks whether scripts/documentai.model exists; if not, it downloads the file automatically before initializing the Document AI model. - Check the return code; if it is not
SUCCESS, handle license, password, resource, model, or input file issues according to the error name.
documentai.model Download Optimization
- - The script preferentially uses the model file pointed to by
COMPDF_DOCUMENT_AI_MODEL. - The default model path is
scripts/documentai.model. - During automatic download, the file is first written to
documentai.model.part and then atomically renamed to the final file upon success, preventing partial file corruption. - On download failure, the script retries automatically with back-off intervals of
2s / 5s / 10s.
Invoking Directly as a Skill
- - In environments that support Agent Skills, the Skill can be called directly:
/compdf-conversion-cli word input.pdf output.docx
/compdf-conversion-cli excel input.pdf output.xlsx --excel-worksheet-option for-page
- - When the Skill receives arguments, it passes them through to the script as-is:
python "${CLAUDE_SKILL_DIR}/scripts/compdf_conversion_cli.py" $ARGUMENTS
- - If the environment does not support direct Skill invocation, fall back to a regular command-line call.
Supported Output Formats
- -
word → calls INLINECODE65 - INLINECODE66 → calls INLINECODE67
- INLINECODE68 → calls INLINECODE69
- INLINECODE70 → calls INLINECODE71
- INLINECODE72 → calls INLINECODE73
- INLINECODE74 → calls INLINECODE75
- INLINECODE76 → calls INLINECODE77
- INLINECODE78 → calls INLINECODE79
- INLINECODE80 → calls INLINECODE81
- INLINECODE82 → reuses
CPDFConversion.start_pdf_to_excel with table/Excel parameters to produce CSV-friendly output
Input Source Types
- - The script supports PDF and image as input sources. The SDK's
start_pdf_to_* interfaces natively accept image files with no pre-processing required. - By default, the script auto-detects the input type from the file extension:
-
.pdf →
pdf
-
.png/.jpg/.jpeg/.bmp/.tif/.tiff/.gif/.webp/.tga →
image
- - You can also specify the source type explicitly:
python "${CLAUDE_SKILL_DIR}/scripts/compdf_conversion_cli.py" word input.png output.docx --source-type image
- -
image -> * and pdf -> * share the same set of CPDFConversion.start_pdf_to_* interfaces; only the input file type differs.
Smart Defaults
The script automatically adjusts certain parameters based on the input source and output format to reduce manual configuration:
| Trigger | Automatic Behavior | User-Overridable | Description |
|---|
Input source is an image (auto-detected or explicit --source-type image) | Automatically enables INLINECODE93 | No (--enable-ocr uses store_true; there is no --no-enable-ocr) | Text in images must be extracted via OCR; without OCR, output will contain only images and no text |
Output format is HTML (format = html) |
Automatically sets
--page-layout-mode to
box (box layout) | Yes — passing
--page-layout-mode flow explicitly overrides this | Box layout better preserves the original formatting in HTML; specify
flow explicitly if flow layout is needed |
When triggered, the script prints a notice to stderr, for example:
CODEBLOCK10
All Parameters
Positional Parameters
| Parameter | Description |
|---|
| INLINECODE103 | Target format: word/excel/ppt/html/rtf/image/txt/json/markdown/ INLINECODE113 |
| INLINECODE114 |
Input file path (PDF or image) |
|
output_path | Output file path |
General Parameters
| Parameter | Type | Default | Description |
|---|
| INLINECODE116 | Option | INLINECODE117 | Input source type: auto/pdf/ INLINECODE120 |
| INLINECODE121 |
String |
"" | PDF open password |
|
--page-ranges | String | None | Page range, e.g.
1-3,5 |
|
--font-name | String |
"" | Output font name |
Layout Parameters
| Parameter | Type | Default | Description |
|---|
| INLINECODE127 | Boolean | True | AI layout analysis (disable with --no-enable-ai-layout) |
| INLINECODE129 |
Option | SDK default
flow (auto-switched to
box for HTML output) | Page layout:
box (box layout) /
flow (flow layout) |
Content Retention Parameters
| Parameter | Type | Default | Description |
|---|
| INLINECODE134 | Boolean | True | Retain images (disable with --no-contain-image) |
| INLINECODE136 |
Boolean |
True | Retain annotations (disable with
--no-contain-annotation) |
|
--contain-page-background-image | Boolean |
True | Retain page background images (disable with
--no-contain-page-background-image) |
|
--formula-to-image | Boolean | False | Convert formulas to image output |
|
--transparent-text | Boolean | False | Preserve transparent text |
Output Control Parameters
| Parameter | Type | Default | Description |
|---|
| INLINECODE142 | Boolean | False | Split output into one document per page |
| INLINECODE143 |
Boolean |
True | Automatically create output directory (disable with
--no-auto-create-folder) |
OCR Parameters
| Parameter | Type | Default | Description |
|---|
| INLINECODE145 | Boolean | False (auto-enabled for image input) | Enable OCR |
| INLINECODE146 |
Option | SDK default
all | OCR scope:
invalid-character/
scan-page/
invalid-character-and-scan-page/
all |
|
--ocr-language | Multi-select |
auto | OCR language(s); multiple languages can be specified simultaneously. Options:
auto/
chinese/
chinese-tra/
english/
korean/
japanese/
latin/
devanagari/
cyrillic/
arabic/
tamil/
telugu/
kannada/
thai/
greek/
eslav |
Excel-Specific Parameters
| Parameter | Type | Default | Description |
|---|
| INLINECODE170 | Boolean | False | Include all content in Excel output |
| INLINECODE171 |
Boolean | False | Output Excel result in CSV format |
|
--excel-worksheet-option | Option | SDK default
for-table | Worksheet split strategy:
for-table/
for-page/
for-document |
JSON-Specific Parameters
| Parameter | Type | Default | Description |
|---|
| INLINECODE177 | Boolean | True | Include table data in JSON output (disable with --no-json-contain-table) |
TXT-Specific Parameters
| Parameter | Type | Default | Description |
|---|
| INLINECODE179 | Boolean | True | Enable table formatting in TXT output (disable with --no-txt-table-format) |
HTML-Specific Parameters
| Parameter | Type | Default | Description |
|---|
| INLINECODE181 | Option | SDK default INLINECODE182 | HTML output mode: single-page/single-page-with-bookmark/multiple-page/ INLINECODE186 |
Image-Specific Parameters
| Parameter | Type | Default | Description |
|---|
| INLINECODE187 | Option | SDK default INLINECODE188 | Image output format: jpg/jpeg/jpeg2000/png/bmp/tiff/tga/gif/ INLINECODE197 |
| INLINECODE198 |
Option | SDK default
color | Image color mode:
color/
gray/
binary |
|
--image-scaling | Float |
1.0 | Image scaling factor |
|
--image-path-enhance | Boolean | False | Enable image path enhancement |
Parameter Default Value Rules
- - Parameters that default to True (
--enable-ai-layout/--contain-image/--contain-annotation/--contain-page-background-image/--auto-create-folder/--json-contain-table/--txt-table-format) use BooleanOptionalAction; pass --no-xxx to disable. - Parameters that default to False (
--enable-ocr/--formula-to-image/--transparent-text/--output-document-per-page/--excel-all-content/--excel-csv-format/--image-path-enhance) use store_true; passing the flag enables them. - All CLI parameter defaults are fully consistent with the SDK's
ConvertOptions() defaults — omitting a parameter is equivalent to using the SDK's original default value.
Recommended Command Examples
PDF to Word (default parameters, AI layout analysis enabled)
CODEBLOCK11
PDF to Word, box layout, no images, no AI layout analysis
CODEBLOCK12
PDF to Word, retain annotations and background images, one document per page
CODEBLOCK13
PDF to Excel, include all content and split worksheets by page
CODEBLOCK14
PDF to TXT, with table formatting enabled
CODEBLOCK15
PDF to HTML, multi-page with bookmarks mode
CODEBLOCK16
PDF to Image, PNG format, grayscale, 2x scaling
CODEBLOCK17
Image to Word (OCR auto-enabled, specify Chinese language)
python "${CLAUDE_SKILL_DIR}/scripts/compdf_conversion_cli.py" word input.png output.docx --ocr-language chinese
Note: For image input, the script automatically enables OCR — there is no need to pass --enable-ocr manually. To specify an OCR language, --ocr-language can still be used.
PDF with OCR enabled (multiple languages)
CODEBLOCK19
Trial License and Usage Limits
- - The
scripts/license.xml auto-downloaded from the ComPDF server is a Trial License, allowing a maximum of 200 conversions. - The script uses a SHA-256 fingerprint to detect whether the current License is the default trial key; no usage limit applies when using any other License.
- After each successful conversion using the trial License, the script prints the current used/remaining count to
stderr, for example:
Trial license: 5/200 conversions used, 195 remaining.
- - When the trial limit is reached (200 conversions), the script refuses to convert and prompts the user to purchase a full License:
Error: Trial license usage limit reached (200 conversions). Please purchase a license at: https://www.compdf.com/contact-sales
- - When the trial License has expired (SDK authentication fails), the error message also includes a purchase link.
- After purchasing a full License, place a custom
license.xml containing the new <key> in scripts/ (overwriting the auto-downloaded trial file) — no script modifications or counter file cleanup are required.
Confirmed Facts
- -
ComPDFKitConversion 3.9.0 has been successfully installed on the local machine. - The installed package provides 10 conversion methods including
CPDFConversion.start_pdf_to_word/start_pdf_to_ppt/start_pdf_to_excel. - INLINECODE233 provides
initialize, license_verify, release, set_document_ai_model, and set_ocr_language. - Official documentation confirms support for PDF to Word / Excel / PPT / HTML / RTF / Image / TXT / JSON / Markdown.
- The SDK's
start_pdf_to_* interfaces natively accept image file input (PNG → Word has been verified successfully). - INLINECODE240 defaults to
True in the SDK; set_document_ai_model() must be called first to load the model before use, otherwise a 0xC0000005 crash will occur. - INLINECODE243 supports specifying multiple languages simultaneously (e.g.
--ocr-language chinese english).
Risks / Notes
- - The official requirements page states Python
>=3.6, while the demo page states <3.11, but PyPI currently provides a cp314 wheel in practice; treat the locally installable wheel as the source of truth, but always verify installation in a new environment first. - If the script cannot download
license.xml from the server (network issue) and no manual file exists in scripts/, or the <key> field is empty, the script cannot complete SDK authentication and cannot perform any real conversions. - INLINECODE251 is a large file (approximately 525 MB); there will be a noticeable download delay the first time OCR / AI layout is enabled. Because
--enable-ai-layout defaults to True, the model download will be triggered on the very first run. - If the runtime environment cannot access
https://download.compdf.com/skills/model/documentai.model, place documentai.model in the scripts/ directory in advance. - Do not directly apply the initialization patterns from ComPDF SDKs for other languages to the Python package; this Skill is based on the locally verified
LibraryManager / CPDFConversion API.
Resource Navigation
- - License file: INLINECODE258
- Script: INLINECODE259
- SDK authentication file:
scripts/license.xml (auto-downloaded from https://download.compdf.com/skills/license/license.xml if missing) - SDK authentication source: the
<key> field in INLINECODE263 - SDK resource path: INLINECODE264
- OCR / AI layout model:
scripts/documentai.model (auto-downloaded if missing) - Purchase a full License: INLINECODE266
- Official documentation:
-
https://www.compdf.com/guides/conversion-sdk/python/overview
-
https://www.compdf.com/guides/conversion-sdk/python/pdf-to-word
-
https://www.compdf.com/guides/conversion-sdk/python/pdf-to-excel
-
https://www.compdf.com/guides/conversion-sdk/python/pdf-to-ppt
- INLINECODE271
Acceptance Checklist
- - [ ]
python -m pip show ComPDFKitConversion shows the installed package - [ ] Running
python "${CLAUDE_SKILL_DIR}/scripts/compdf_conversion_cli.py" --help or an equivalent local command produces normal output - [ ] The script auto-downloads
scripts/license.xml if missing, then extracts the license key from the <key> field for authentication - [ ] The script uses the
scripts/ directory as the SDK resource path - [ ] The script recognizes all 10 target formats:
word/excel/ppt/html/rtf/image/txt/json/markdown/ INLINECODE286 - [ ] The script accepts both PDF and image files (
.png/.jpg/.jpeg/.bmp/.tif/.tiff/.gif/.webp/.tga) as input - [ ] When
--enable-ocr or --enable-ai-layout (enabled by default) is active and documentai.model is missing, the script auto-downloads the model - [ ] When
license.xml cannot be obtained (download fails and no manual file exists) or authentication fails, a clear error is output rather than a silent failure - [ ] The 7 parameters that default to True can be disabled with INLINECODE300
- [ ]
--ocr-language supports specifying multiple languages simultaneously - [ ] After a conversion using the trial License, the usage count increments
- [ ] When the trial License reaches 200 conversions, the script refuses to convert and outputs a purchase link
- [ ] When using a non-trial License, no usage limit applies
- [ ] For image input, even if
--enable-ocr is not passed, the script automatically enables OCR and prints a notice to INLINECODE303 - [ ] For HTML output, even if
--page-layout-mode is not passed, the script automatically uses box (box layout) and prints a notice to INLINECODE306 - [ ] For HTML output, explicitly passing
--page-layout-mode flow overrides the automatic box layout behavior
Distribution Notes
- - This Skill does not depend on any machine-specific absolute paths.
- When distributing to other users, the following directory structure is sufficient:
compdf-conversion-cli/
├── SKILL.md
├── License.txt
└── scripts/
└── compdf_conversion_cli.py
- - Users place this directory under their own skills root directory and the Skill is ready to use.
- INLINECODE308 is auto-downloaded at runtime; no need to include it in the distribution package.
Common Pitfalls
- -
scripts/license.xml is missing and cannot be auto-downloaded (network unavailable or server error): the script will error out before authentication. If you are in an offline environment, place license.xml manually in the scripts/ directory. - INLINECODE312 is missing the
<key> field or its value is empty: the script will error out before authentication. - SDK resource files required by the SDK are absent from the
scripts/ directory: conversion may fail after LibraryManager.initialize(). - A password-protected PDF is provided without
--password: this will trigger PDF_PASSWORD_ERROR. - OCR / AI layout is enabled but
documentai.model is not present locally and the network is unavailable: the model download will fail; place the file in the scripts/ directory manually in advance. - When the Excel output strategy is unclear, prefer passing
--excel-worksheet-option explicitly to avoid unexpected result structures. - When converting images to other formats, the script already enables OCR automatically; if the output still contains no text, check whether
documentai.model is complete and whether the OCR language matches. - Once the trial License usage limit is exhausted, a full License must be purchased to continue; purchase link:
https://www.compdf.com/contact-sales.
Copyright
This Skill is built on top of the ComPDFKit Conversion SDK.
CODEBLOCK23
- - SDK Name: ComPDFKitConversion
- SDK Author: PDF Technologies, Inc.
- License Type: Commercial License (Commercial / Proprietary) — non-exclusive, non-transferable, non-sublicensable, revocable
- Official Website: https://www.compdf.com
- Contact: support@compdf.com
- Terms of Service: https://www.compdf.com/terms-of-service
- Privacy Policy: https://www.compdf.com/privacy-policy
Important: Under the ComPDFKit Terms of Service, distributing the documentation, sample code, or source code of the ComPDFKit Conversion SDK to third parties is prohibited. Please ensure you have obtained a valid ComPDFKit License before using this Skill.
ComPDF Conversion CLI 技能
目的
- - 将 ComPDFKitConversion Python SDK 封装为可重用的本地转换工作流,支持 PDF/图片 转换为 Word、PPT、Excel、HTML、RTF、图片、TXT、JSON、Markdown 和 CSV(共 10 种输出格式)。
技能标准兼容性
- - 本技能使用与 Anthropic Agent Skills 兼容的目录结构:compdf-conversion-cli/。
- 入口点为 SKILL.md;辅助脚本放置在 scripts/ 目录中。
- 文档使用 $ARGUMENTS 和 ${CLAUDESKILLDIR} 约定,以便在 Claude Code / Agent Skills 兼容环境中分发和执行。
输入/输出
- - 输入:目标格式(word/excel/ppt/html/rtf/image/txt/json/markdown/csv)、PDF 或图片路径以及输出路径通过技能参数或命令行传递。还可提供可选的 PDF 密码和转换参数。
- 支持的输入文件类型:
- PDF 文件(.pdf)
- 图片文件(.jpg/.jpeg/.png/.bmp/.tif/.tiff/.webp/.jp2/.gif/.tga)
- - 输出:相应格式的文件(.docx、.pptx、.xlsx、.html、.rtf、图片、.txt、.json、.md、.csv),或清晰的错误信息。
前提条件
- - 支持 Windows 和 macOS。
- 必须先安装转换 SDK:
bash
pip install ComPDFKitConversion
- - 首次运行时,脚本自动从 ComPDF 服务器下载 license.xml 并缓存在 scripts/ 目录中:
text
https://download.compdf.com/skills/license/license.xml
- - 脚本读取 license.xml 中的 ... 字段,并使用该密钥进行 LibraryManager.licenseverify(...) 认证——它不会将 XML 文件路径直接传递给 SDK。
- 要使用自定义许可证,请将您自己的 license.xml 放置在 scripts/ 目录中;脚本将直接使用它,无需下载。
- 在 SDK 初始化期间,resource 目录始终设置为包含 compdfconversion_cli.py 的目录,即 scripts/ 目录本身。
- 当使用 --enable-ocr 或 --enable-ai-layout(默认启用)时,技能还需要 scripts/documentai.model。如果文件不存在,脚本将自动从以下地址下载:
text
https://download.compdf.com/skills/model/documentai.model
- - 要重用现有的模型文件,可以通过环境变量覆盖默认模型路径:
bash
export COMPDF
DOCUMENTAI_MODEL=/path/to/documentai.model
工作流程
- 1. 确认已安装 Python 包:
bash
python -m pip show ComPDFKitConversion
- 2. 脚本在首次运行时自动下载 license.xml;scripts/ 目录直接用作 SDK resource 路径。
- 在 Agent Skills / Claude Code 环境中,优先使用技能的内置脚本路径变量:
bash
python ${CLAUDE
SKILLDIR}/scripts/compdf
conversioncli.py word input.pdf output.docx
python ${CLAUDE
SKILLDIR}/scripts/compdf
conversioncli.py ppt input.pdf output.pptx
python ${CLAUDE
SKILLDIR}/scripts/compdf
conversioncli.py excel input.pdf output.xlsx
- 4. 如需更多控制,可附加常用参数:
bash
python ${CLAUDE
SKILLDIR}/scripts/compdf
conversioncli.py excel input.pdf output.xlsx --page-ranges 1-3,5 --excel-all-content --excel-worksheet-option for-page
python ${CLAUDE
SKILLDIR}/scripts/compdf
conversioncli.py word input.pdf output.docx --enable-ocr --page-layout-mode flow
- 5. 启动时,脚本确保 scripts/license.xml 存在(如果缺失则自动从 ComPDF 服务器下载),读取 字段用于 SDK 认证,并使用 scripts/ 目录作为 resource 路径。
- 如果启用了 --enable-ocr 或 --enable-ai-layout(默认启用),脚本会检查 scripts/documentai.model 是否存在;如果不存在,则在初始化 Document AI 模型之前自动下载该文件。
- 检查返回码;如果不是 SUCCESS,则根据错误名称处理许可证、密码、资源、模型或输入文件问题。
documentai.model 下载优化
- - 脚本优先使用 COMPDFDOCUMENTAI_MODEL 指向的模型文件。
- 默认模型路径为 scripts/documentai.model。
- 在自动下载期间,文件首先写入 documentai.model.part,然后在成功后原子重命名为最终文件,防止部分文件损坏。
- 下载失败时,脚本会自动重试,退避间隔为 2s / 5s / 10s。
直接作为技能调用
- - 在支持 Agent Skills 的环境中,可以直接调用技能:
text
/compdf-conversion-cli word input.pdf output.docx
/compdf-conversion-cli excel input.pdf output.xlsx --excel-worksheet-option for-page
bash
python ${CLAUDE
SKILLDIR}/scripts/compdf
conversioncli.py $ARGUMENTS
- - 如果环境不支持直接技能调用,则回退到常规命令行调用。
支持的输出格式
- - word → 调用 CPDFConversion.startpdftoword
- excel → 调用 CPDFConversion.startpdftoexcel
- ppt → 调用 CPDFConversion.startpdftoppt
- html → 调用 CPDFConversion.startpdftohtml
- rtf → 调用 CPDFConversion.startpdftortf
- image → 调用 CPDFConversion.startpdftoimage
- txt → 调用 CPDFConversion.startpdftotxt
- json → 调用 CPDFConversion.startpdftojson
- markdown → 调用 CPDFConversion.startpdftomarkdown
- csv → 重用 CPDFConversion.startpdftoexcel,使用表格/Excel 参数生成 CSV 友好输出
输入源类型
- - 脚本支持 PDF 和图片作为输入源。SDK 的 startpdfto_* 接口原生接受图片文件,无需预处理。
- 默认情况下,脚本根据文件扩展名自动检测输入类型:
- .pdf → pdf
- .png/.jpg/.jpeg/.bmp/.tif/.tiff/.gif/.webp/.tga → image
bash
python ${CLAUDE
SKILLDIR}/scripts/compdf
conversioncli.py word input.png output.docx --source-type image
- - image -> 和 pdf -> 共享同一组 CPDFConversion.startpdfto_* 接口;仅输入文件类型不同。
智能默认值
脚本根据输入源和输出格式自动调整某些参数,以减少手动配置:
| 触发条件 | 自动行为 | 用户可覆盖 | 描述 |
|---|
| 输入源为图片(自动检测或显式 --source-type image) | 自动启用 --enable-ocr | 否(--enable-ocr 使用 store_true;没有 --no-enable-ocr) | 图片中的文本必须通过 OCR 提取;没有 OCR,输出将仅包含图片而无文本 |
| 输出格式为 HTML(format = html) |
自动将 --page-layout-mode 设置为 box(框布局) | 是——显式传递 --page-layout-mode flow 可覆盖此设置 | 框布局能更好地保留 HTML 中的原始格式;如果需要流式布局,请显式指定 flow |
触发时,脚本会向 stderr 打印通知,例如:
text
Auto-enabled OCR for image input.
Auto-set page layout mode to BOX for HTML output.
所有参数
位置参数
目标格式:word/excel/ppt