返回顶部
p

pdfPDF全能工具

Create, read, edit, merge, split PDF files. Supports text extraction, table extraction, form filling, watermarks, OCR, and HTML-to-PDF conversion.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
366
下载量
免费
免费
0
收藏
概述
安装方式
版本历史

pdf

PDF 技能 v2.0

概述

使用 pypdf、pdfplumber、weasyprint 和命令行工具完成 PDF 处理。支持所有常见 PDF 操作。

安装与依赖

必需

bash pip install pypdf pdfplumber weasyprint

可选

bash

用于图像转换

brew install poppler pip install pdf2image

用于 OCR

pip install pytesseract brew install tesseract

用于表单填写

pip install pypdf-forms

快速入门

读取 PDF

python from pypdf import PdfReader

reader = PdfReader(document.pdf)
print(f页数: {len(reader.pages)})

从第一页提取文本

page = reader.pages[0] text = page.extract_text() print(text)

合并 PDF

python from pypdf import PdfWriter, PdfReader

writer = PdfWriter()
for pdf in [doc1.pdf, doc2.pdf]:
reader = PdfReader(pdf)
for page in reader.pages:
writer.add_page(page)

with open(merged.pdf, wb) as f:
writer.write(f)

从 HTML 创建 PDF

python from weasyprint import HTML

HTML(string=

Hello PDF

).write_pdf(output.pdf)

完整 API 参考

读取 PDF

python
from pypdf import PdfReader

基本读取

reader = PdfReader(document.pdf)

获取页数

num_pages = len(reader.pages)

从所有页面提取文本

full_text = for page in reader.pages: fulltext += page.extracttext()

从特定页面提取文本

page = reader.pages[5] # 第6页(从0开始索引) text = page.extract_text()

获取元数据

meta = reader.metadata print(f标题: {meta.title}) print(f作者: {meta.author}) print(f主题: {meta.subject}) print(f创建者: {meta.creator}) print(f创建日期: {meta.creation_date})

获取大纲/书签

outline = reader.outline for item in outline: print(item.title)

检查是否加密

if reader.is_encrypted: reader.decrypt(password)

提取表格

python
import pdfplumber
import pandas as pd

打开 PDF

with pdfplumber.open(document.pdf) as pdf: # 获取页数 print(f页数: {len(pdf.pages)})

# 从第一页提取表格
page = pdf.pages[0]
tables = page.extract_tables()

for i, table in enumerate(tables):
print(f表格 {i+1}:)
for row in table:
print(row)

将所有表格转换为 Excel

all_tables = [] with pdfplumber.open(document.pdf) as pdf: for page in pdf.pages: tables = page.extract_tables() for table in tables: if table and len(table) > 1: df = pd.DataFrame(table[1:], columns=table[0]) all_tables.append(df)

合并并导出

if all_tables: combined = pd.concat(alltables, ignoreindex=True) combined.toexcel(extractedtables.xlsx, index=False)

合并 PDF

python
from pypdf import PdfWriter, PdfReader

合并多个文件

def mergepdfs(inputfiles, output_file): writer = PdfWriter()

for pdffile in inputfiles:
reader = PdfReader(pdf_file)
print(f正在添加 {pdf_file} ({len(reader.pages)} 页))
for page in reader.pages:
writer.add_page(page)

with open(output_file, wb) as f:
writer.write(f)

print(f✓ 已将 {len(inputfiles)} 个文件合并到 {outputfile})

使用示例

merge_pdfs([doc1.pdf, doc2.pdf, doc3.pdf], merged.pdf)

拆分 PDF

python
from pypdf import PdfReader, PdfWriter

拆分为单独页面

def splitpdf(inputfile, output_prefix): reader = PdfReader(input_file)

for i, page in enumerate(reader.pages):
writer = PdfWriter()
writer.add_page(page)

outputfile = f{outputprefix}page{i+1}.pdf
with open(output_file, wb) as f:
writer.write(f)

print(f✓ 已拆分为 {len(reader.pages)} 个文件)

提取特定页面

def extractpages(inputfile, outputfile, pagenumbers): 提取特定页面(从1开始索引) reader = PdfReader(input_file) writer = PdfWriter()

for pagenum in pagenumbers:
writer.addpage(reader.pages[pagenum - 1])

with open(output_file, wb) as f:
writer.write(f)

print(f✓ 已提取页面 {page_numbers})

使用示例

split_pdf(document.pdf, page) extract_pages(document.pdf, selected.pdf, [1, 3, 5])

旋转页面

python
from pypdf import PdfReader, PdfWriter

旋转所有页面

def rotatepdf(inputfile, output_file, rotation=90): reader = PdfReader(input_file) writer = PdfWriter()

for page in reader.pages:
page.rotate(rotation) # 90、180 或 270
writer.add_page(page)

with open(output_file, wb) as f:
writer.write(f)

旋转特定页面

reader = PdfReader(input.pdf) writer = PdfWriter()

for i, page in enumerate(reader.pages):
if i == 0: # 仅旋转第一页
page.rotate(90)
writer.add_page(page)

with open(output.pdf, wb) as f:
writer.write(f)

从 HTML 创建 PDF

python
from weasyprint import HTML, CSS

基本 HTML 转 PDF

html_content =

Hello PDF

This is a test document.

HTML(string=htmlcontent).writepdf(output.pdf)

使用外部 CSS

HTML( string=

Styled PDF

, url_stylesheet=style.css ).write_pdf(styled.pdf)

自定义页面大小

HTML(string=

Landscape

).write_pdf( landscape.pdf, stylesheets=[CSS(string=@page { size: landscape; })] )

添加页眉/页脚

htmlwithpagenum =

Document with Page Numbers

添加水印

python
from pypdf import PdfReader, PdfWriter
from io import BytesIO
from reportlab.pdfgen import canvas

def createwatermark(text, outputpath):
创建水印 PDF
packet = BytesIO()
c = canvas.Canvas(packet)

# 绘制文本
c.saveState()
c.translate(300, 400)
c.rotate(45)
c.setFont(Helvetica-Bold, 50)
c.setFillColorRGB(0.5, 0.5, 0.5, 0.3) # 灰色带透明度
c.drawCentredString(0, 0, text)
c.restoreState()

c.save()
packet.seek(0)

return PdfReader(packet)

应用水印

def watermarkpdf(inputfile, outputfile, watermarktext): reader = PdfReader(input_file) watermark = createwatermark(watermarktext, temp.pdf) watermark_page = watermark.pages[0] writer = PdfWriter()

for page in reader.pages:
page.mergepage(watermarkpage)
writer.add_page(page)

with open(output_file, wb) as

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 pdf-skill-1776104478 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 pdf-skill-1776104478 技能

通过命令行安装

skillhub install pdf-skill-1776104478

下载

⬇ 下载 pdf v1.0.0(免费)

文件大小: 5.72 KB | 发布时间: 2026-4-14 14:04

v1.0.0 最新 2026-4-14 14:04
Major update: PDF skill upgraded with full-featured PDF processing support.

- Now supports creating, reading, editing, merging, and splitting PDFs.
- Added advanced capabilities: text & table extraction, OCR, HTML-to-PDF, watermarks, and form filling.
- Documentation now covers sample code and usage for all common PDF operations.
- Includes guidance on required and optional dependencies for extended features.
- Reorganized and expanded the API reference for easier use and discovery.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部