Pandoc Document Converter

Convert documents between any formats pandoc supports, with full control over styling, templates,
table of contents, metadata, and PDF engine selection.

Quick Start

For most conversions, use the helper script at scripts/convert.sh:

CODEBLOCK0

The script auto-detects formats from file extensions and applies sensible defaults (standalone
output, appropriate PDF engine, default LaTeX margins for LaTeX-based PDF engines). It also checks
that pandoc, the input file, the output directory, and any requested PDF engine are available.
Any extra arguments are passed through to pandoc.

How Conversions Work

Pandoc reads a source format into an internal AST, then writes it out in the target format. This
means you can go from nearly any supported input to any supported output. The key decision points are:

1. Input format — usually auto-detected from the file extension
Output format — auto-detected from the output file extension
PDF engine — for PDF output, choose between xelatex (best Unicode/font support),

lualatex (strong Unicode/fonts), tectonic (self-contained TeX), pdflatex (fastest, good for ASCII-heavy docs), or HTML/CSS engines like weasyprint, wkhtmltopdf, or prince

4. Styling — CSS for HTML-based outputs, LaTeX templates for PDF, reference docs for DOCX/ODT

Common Conversion Patterns

HTML → PDF

pandoc input.html -o output.pdf --pdf-engine=weasyprint -s

If the HTML uses external CSS, include it: CODEBLOCK2

Markdown → PDF

CODEBLOCK3

Markdown → DOCX

pandoc input.md -o output.docx -s

To use a reference (template) document for styling: CODEBLOCK5

Markdown → HTML

CODEBLOCK6

DOCX → Markdown

CODEBLOCK7

Markdown → EPUB

CODEBLOCK8

LaTeX → PDF

CODEBLOCK9

CSV → HTML table

CODEBLOCK10

Styling and Appearance

CSS for HTML-based outputs

Create or use a CSS file and pass it with --css=path/to/style.css. For PDF output via weasyprint, wkhtmltopdf, or prince, CSS is respected directly. For PDF via LaTeX engines, CSS is usually ignored — use LaTeX variables or templates instead.

A sensible default stylesheet is provided at assets/default.css. Use it when the user wants
a clean, readable output without specifying their own styles:
CODEBLOCK11

LaTeX variables for PDF styling

Control margins, fonts, and paper size without a full template: CODEBLOCK12

Reference documents for DOCX/ODT

To match a corporate style, provide a reference document: CODEBLOCK13

Advanced Features

Add --toc and optionally --toc-depth=N (default 3): CODEBLOCK14

Metadata

Set title, author, date via YAML frontmatter in the source file or via -M: CODEBLOCK15

Filters and Lua filters

Pandoc supports filters that transform the AST. Lua filters are self-contained: CODEBLOCK16

Multiple input files

Pandoc concatenates multiple inputs: CODEBLOCK17

Extracting media from DOCX/EPUB

CODEBLOCK18

Troubleshooting

Problem	Likely cause	Fix
PDF has missing characters	Font doesn't support the glyphs	Use `--pdf-engine=xelatex` with INLINECODE18
PDF conversion fails

No compatible PDF engine installed | Check which xelatex lualatex tectonic pdflatex weasyprint wkhtmltopdf prince and install one that matches your output needs | | DOCX looks unstyled | No reference doc | Create a styled DOCX template and pass --reference-doc | | HTML images missing | Relative paths broken | Use --self-contained to embed images as base64 | | CSS has no effect on PDF | LaTeX PDF engine selected | Use --pdf-engine=weasyprint, --pdf-engine=wkhtmltopdf, or --pdf-engine=prince | | Table of contents empty | No headings in source | Ensure source uses # headings (Markdown) or <h1>–<h6> (HTML) |

Format Reference

For a full list of supported input and output formats, see references/formats.md.

Choosing the Right Approach

When a user asks to convert a document, think about:

1. What's the source format? Check the file extension or ask. If it's ambiguous (e.g., a INLINECODE29

that's actually Markdown), specify -f markdown explicitly.

2. What's the target format? Map the user's intent to a file extension.
Does it need styling? If the user wants it to "look nice" or "be professional," add CSS

(for HTML) or LaTeX variables (for PDF) or a reference doc (for DOCX).

4. Does it need structure? TOC, numbered sections, metadata — add these when the document is

long or formal.

5. Are there images or media? Use --self-contained for HTML, --extract-media when

converting from DOCX/EPUB to text formats.

Always use the helper script scripts/convert.sh as the starting point — it handles the most
common gotchas automatically, picks a reasonable PDF engine, and prints recovery hints when PDF
conversion fails. Add extra pandoc flags as needed for the specific use case.

技能名称: pandoc
详细描述:

Pandoc 文档转换器

在 pandoc 支持的任何格式之间转换文档，并可完全控制样式、模板、目录、元数据和 PDF 引擎选择。

快速开始

对于大多数转换，请使用 scripts/convert.sh 辅助脚本：

bash
bash /scripts/convert.sh <输入文件> <输出文件> [选项...]

该脚本会根据文件扩展名自动检测格式，并应用合理的默认设置（独立输出、合适的 PDF 引擎、基于 LaTeX 的 PDF 引擎的默认 LaTeX 边距）。它还会检查 pandoc、输入文件、输出目录以及任何请求的 PDF 引擎是否可用。任何额外的参数都会直接传递给 pandoc。

转换原理

Pandoc 将源格式读取为内部 AST，然后以目标格式输出。这意味着你可以从几乎任何支持的输入格式转换到任何支持的输出格式。关键决策点包括：

1. 输入格式 — 通常根据文件扩展名自动检测
输出格式 — 根据输出文件扩展名自动检测
PDF 引擎 — 对于 PDF 输出，可在 xelatex（最佳 Unicode/字体支持）、lualatex（强大的 Unicode/字体支持）、tectonic（自包含的 TeX）、pdflatex（最快，适合以 ASCII 为主的文档）或 HTML/CSS 引擎（如 weasyprint、wkhtmltopdf 或 prince）之间选择
样式 — 基于 HTML 的输出使用 CSS，PDF 使用 LaTeX 模板，DOCX/ODT 使用参考文档

常见转换模式

HTML → PDF

bash pandoc input.html -o output.pdf --pdf-engine=weasyprint -s

如果 HTML 使用了外部 CSS，请包含它：
bash
pandoc input.html -o output.pdf --pdf-engine=weasyprint -s --css=style.css

Markdown → PDF

bash pandoc input.md -o output.pdf --pdf-engine=xelatex -s --toc --toc-depth=3

Markdown → DOCX

bash pandoc input.md -o output.docx -s

要使用参考（模板）文档进行样式设置：
bash
pandoc input.md -o output.docx --reference-doc=template.docx

Markdown → HTML

bash pandoc input.md -o output.html -s --css=style.css --toc

DOCX → Markdown

bash pandoc input.docx -o output.md --extract-media=./media

Markdown → EPUB

bash pandoc input.md -o output.epub -s --toc --epub-cover-image=cover.jpg

LaTeX → PDF

bash pandoc input.tex -o output.pdf --pdf-engine=xelatex

CSV → HTML 表格

bash pandoc input.csv -o output.html -s

样式与外观

基于 HTML 的输出的 CSS

创建或使用 CSS 文件，并通过 --css=path/to/style.css 传递。对于通过 weasyprint、wkhtmltopdf 或 prince 生成的 PDF 输出，CSS 会直接被应用。对于通过 LaTeX 引擎生成的 PDF，CSS 通常会被忽略——请改用 LaTeX 变量或模板。

在 assets/default.css 中提供了一个合理的默认样式表。当用户希望获得干净、可读的输出而不指定自己的样式时，可以使用它：
bash
pandoc input.md -o output.html -s --css=/assets/default.css

用于 PDF 样式的 LaTeX 变量

无需完整模板即可控制边距、字体和纸张大小： bash pandoc input.md -o output.pdf --pdf-engine=xelatex \ -V geometry:margin=1in \ -V fontsize=12pt \ -V mainfont=DejaVu Serif \ -V documentclass=article

DOCX/ODT 的参考文档

要匹配企业风格，请提供参考文档： bash pandoc input.md -o output.docx --reference-doc=brand-template.docx

高级功能

元数据

通过源文件中的 YAML 前言或通过 -M 设置标题、作者和日期： bash pandoc input.md -o output.pdf --pdf-engine=xelatex -s \ -M title=我的报告 -M author=张三 -M date=2026-03-15

过滤器和 Lua 过滤器

Pandoc 支持转换 AST 的过滤器。Lua 过滤器是自包含的： bash pandoc input.md -o output.pdf --lua-filter=my-filter.lua

多个输入文件

Pandoc 会连接多个输入： bash pandoc chapter1.md chapter2.md chapter3.md -o book.pdf --pdf-engine=xelatex -s --toc

从 DOCX/EPUB 中提取媒体文件

bash pandoc input.docx -o output.md --extract-media=./media

故障排除

问题	可能原因	解决方法
PDF 缺少字符	字体不支持该字形	使用 --pdf-engine=xelatex 并加上 -V mainfont=DejaVu Serif
PDF 转换失败

–

（HTML） |

格式参考

有关支持的输入和输出格式的完整列表，请参阅 references/formats.md。

选择正确的方法

当用户要求转换文档时，请考虑：

1. 源格式是什么？ 检查文件扩展名或询问。如果不明确（例如，实际上是 Markdown 的 .txt 文件），请明确指定 -f markdown。
目标格式是什么？ 将用户的意图映射到文件扩展名。
需要样式吗？ 如果用户希望文档“看起来漂亮”或“专业”，请添加 CSS（用于 HTML）、LaTeX 变量（用于 PDF）或参考文档（用于 DOCX）。
需要结构吗？ 目录、编号章节、元数据——当文档较长或正式时添加这些。
有图片或媒体文件吗？ 对于 HTML 使用 --self-contained，当从 DOCX/EPUB 转换为文本格式时使用 --extract-media。

始终以辅助脚本 scripts/convert.sh 为起点——它会自动处理最常见的陷阱，选择一个合理的 PDF 引擎，并在 PDF 转换失败时打印恢复提示。根据具体用例需要添加额外的 pandoc 标志。

pandocPandoc转换器

pandoc

Pandoc Document Converter

Quick Start

How Conversions Work

Common Conversion Patterns

HTML → PDF

Markdown → PDF

Markdown → DOCX

Markdown → HTML

DOCX → Markdown

Markdown → EPUB

LaTeX → PDF

CSV → HTML table

Styling and Appearance

CSS for HTML-based outputs

LaTeX variables for PDF styling

Reference documents for DOCX/ODT

Advanced Features

Table of Contents

Metadata

Filters and Lua filters

Multiple input files

Extracting media from DOCX/EPUB

Troubleshooting

Format Reference

Choosing the Right Approach

Pandoc 文档转换器

快速开始

转换原理

常见转换模式

HTML → PDF

Markdown → PDF

Markdown → DOCX

Markdown → HTML

DOCX → Markdown

Markdown → EPUB

LaTeX → PDF

CSV → HTML 表格

样式与外观

基于 HTML 的输出的 CSS

用于 PDF 样式的 LaTeX 变量

DOCX/ODT 的参考文档

高级功能

目录

元数据

过滤器和 Lua 过滤器

多个输入文件

从 DOCX/EPUB 中提取媒体文件

故障排除

–

（HTML） |

格式参考

选择正确的方法

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement