Run a local script to work with PDF files, DOCX documents, OCR, and text-to-speech. Use the read tool to load this SKILL.md, then exec the uv run command inside it. Do NOT use sessions_spawn. Triggers: read pdf, extract text from pdf, merge pdfs, split pdf, rotate pdf, ocr pdf, read docx, create docx, text to speech, convert to mp3, pdf info, pdf pages.
- uv must already be installed because this skill is executed with uv run, and uv installs the Python dependencies declared in src/main.py.
INLINECODE4 is needed for tts because the speech output is normalized and written as an .mp3 file through ffmpeg.
INLINECODE8 is needed for ocr because it performs the actual optical character recognition on scanned page images.
INLINECODE10 is also needed for ocr because it extracts page images from PDFs before those images are passed to tesseract; pdfimages comes from poppler.
INLINECODE15 is optional for convert because it can convert between many document formats when text-based conversion is possible.
INLINECODE17 is an optional alternative to pandoc for convert because it can handle document conversions that pandoc may not support well.
File Access And Network Behavior
- This skill operates on the file paths provided by the caller. It can read from and write to any host path the caller supplies; it is not limited to the OpenClaw workspace.
The /root/.openclaw/workspace/... paths in the command examples show where the skill entrypoint lives. They do not restrict which files the skill can access.
The tts command uses edge-tts, which sends the input text to an external text-to-speech service over the network to generate audio.
Do not use tts with sensitive or private text unless you are comfortable sending that text off-host.
All other commands run locally on the host, subject to the optional local binaries documented below.
Skill: PDF Toolkit
When to use
- User wants to extract text, tables, or images from a PDF.
User wants to get metadata or page count from a PDF.
User wants to merge, split, or rotate a PDF.
User wants to create a new PDF from plain text or Markdown.
User wants to read or write a DOCX file.
User wants to OCR a scanned PDF (requires tesseract on host).
User wants to convert text or a document to an MP3 audio file (requires ffmpeg on host).
User wants to convert between document formats (requires pandoc or libreoffice on host).
User wants to check which optional system tools are available.
When NOT to use
- User wants to view or render a PDF visually — use a PDF viewer.
User wants to fill in PDF form fields — this skill does not support AcroForms.
User wants to edit an existing PDF's text in-place — use a dedicated PDF editor.
Commands
Check available tools
CODEBLOCK0
Get PDF metadata and page count
CODEBLOCK1
Extract text from a PDF
CODEBLOCK2
Extract tables from a PDF
CODEBLOCK3
Extract images from a PDF
CODEBLOCK4
Merge PDFs
CODEBLOCK5
Split a PDF
CODEBLOCK6
Rotate pages in a PDF
CODEBLOCK7
Create a PDF from text
CODEBLOCK8
Read a DOCX file
CODEBLOCK9
Write a DOCX file
CODEBLOCK10
OCR a scanned PDF (requires tesseract)
CODEBLOCK11
Convert text or document to speech (requires ffmpeg)
CODEBLOCK12
Convert document formats (requires pandoc or libreoffice)
CODEBLOCK13
Examples
CODEBLOCK14
Chat Delivery
- When this skill is used in a chat interface that supports file attachments, such as Telegram, any generated output file should be sent back to the user as an attachment after successful creation or conversion.
This applies to commands that create files, including create-pdf, write-docx, extract-images, merge, split, rotate, tts, and convert.
If a temporary output file is created in the Claw runtime temporary folder for delivery, delete that temporary file immediately after the file has been sent successfully to the user.
Do not delete files that were written to a user-requested destination outside the Claw temporary folder.
If the chat environment cannot send file attachments, report the output path clearly instead of claiming the file was delivered.
Output
- Plain text with labeled sections separated by blank lines.
Errors are prefixed with Error:.
The doctor command shows a table of available and missing tools.
Notes
- uv run reads the inline # /// script dependency block in main.py and auto-installs Python packages in an isolated environment — no pip install or venv setup needed.
Core features (info, extract-text, extract-tables, merge, split, rotate, create-pdf, read-docx, write-docx) work with uv alone — no system binaries required.
OCR requires tesseract installed on the host (brew install tesseract / apt install tesseract-ocr). Also needs pdfimages from poppler (brew install poppler).
TTS requires ffmpeg installed on the host (brew install ffmpeg / apt install ffmpeg).
Document conversion requires pandoc or libreoffice on the host.
Run doctor first if you are unsure which features are available.
bash
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py doctor
获取 PDF 元数据和页数
bash
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py info
从 PDF 中提取文本
bash
所有页面
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py extract-text
指定页面(从1开始编号,逗号分隔或范围)
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py extract-text --pages 1,3,5-8
从 PDF 中提取表格
bash
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py extract-tables
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py extract-tables --pages 2-4
从 PDF 中提取图像
bash
默认保存到当前目录
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py extract-images
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py extract-images --output-dir /path/to/output
合并 PDF
bash
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py merge [ ...] --output merged.pdf
拆分 PDF
bash
拆分为单独的页面
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py split --output-dir /path/to/output
提取页面范围到新的 PDF
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py split --pages 2-5 --output extracted.pdf
旋转 PDF 中的页面
bash
将所有页面顺时针旋转90度
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py rotate --degrees 90 --output rotated.pdf
旋转指定页面
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py rotate --degrees 180 --pages 1,3 --output rotated.pdf
从文本创建 PDF
bash
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py create-pdf --text Hello, world! --output hello.pdf
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py create-pdf --file input.txt --output document.pdf
读取 DOCX 文件
bash
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py read-docx
写入 DOCX 文件
bash
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py write-docx --text Content here --output document.docx
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py write-docx --file input.txt --output document.docx
对扫描版 PDF 进行 OCR(需要 tesseract)
bash
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py ocr
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py ocr --pages 1-3 --lang eng
将文本或文档转换为语音(需要 ffmpeg)
bash
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py tts --text Hello, world! --output speech.mp3
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py tts --file input.txt --output speech.mp3
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py tts --file document.pdf --output speech.mp3
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py tts --text Hello --voice en-GB-SoniaNeural --output speech.mp3
转换文档格式(需要 pandoc 或 libreoffice)
bash
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py convert path> --output
- Added "Chat Delivery" section to SKILL.md with guidelines for sending generated files as chat attachments and cleaning up temporary files.
- Clarified file delivery and deletion behavior for output files in chat interfaces.
- No functional code or command changes; documentation update only.