Chinese Ebook Downloader
Download Chinese ebooks from multiple sources with automatic fallback and format conversion.
Quick Start
CODEBLOCK0
Download Sources (Priority Order)
| Source | Coverage | Limit | Notes |
|---|
| Source A (online book library) | ~100% | None | Primary — high coverage for popular Chinese books |
| Source B (secondary library) |
~8% | None | Fallback for missing titles |
|
Source C (Anna's Archive) | Wide | Rate-limited | Last resort — uses libgen.li mirrors |
Note: Z-Library has been deprecated due to 10/day download limit.
Multi-Source Fallback
The multi_source_download.py script automatically tries sources in order:
CODEBLOCK1
Workflow per book:
- 1. Try Source A (ZIP → extract PDF/EPUB)
- If failed, try Source B (file host download)
- If failed, try Source C (Anna's Archive via libgen.li)
- If only EPUB found, auto-convert to PDF using weasyprint
Usage:
CODEBLOCK2
EPUB → PDF Conversion
When only EPUB format is available, auto-convert using weasyprint:
CODEBLOCK3
Requirements: ebooklib, weasyprint, CJK fonts installed.
Scripts Reference
| Script | Purpose |
|---|
| INLINECODE3 | Primary download from Source A |
| INLINECODE4 |
Source B search & download |
|
search_source_c.py | Anna's Archive search & download |
|
batch_download.py | Batch download from JSON list |
|
multi_source_download.py | Multi-source A→B→C fallback |
|
epub_to_pdf.py | EPUB/MOBI to PDF conversion |
|
anna_iso_batch.sh | Anna's Archive isolated batch (one process per book) |
Source A Workflow (Primary)
CODEBLOCK4
Step 1: Search
Search the primary library for the book title. Navigate to download page, extract file host URL and password.
Step 2: Decrypt
Navigate to file host URL, enter password, click decrypt.
Step 3: Wait for countdown
File hosting service requires countdown before download.
Do not skip.
Step 4: Fetch real download URL
Get page variables:
CODEBLOCK5
Call API:
CODEBLOCK6
Response code: 200 → downurl is real URL.
Step 5: Download
CODEBLOCK7
Step 6: Extract ZIP (GBK encoding)
CODEBLOCK8
Book Name Matching Strategy
When a book title is long or contains multiple names (e.g. box sets):
- - Removes subtitles (after ":" or ":")
- Removes parenthetical content ("(...)", "(...)")
- Removes "套装共X册" bundle descriptions
- Splits "+"-connected titles into individual books
- Tries each keyword until match found
- Falls back to full title + author
Examples:
- - "杨定一全部生命系列:真原医+静坐+好睡(套装3册)" → tries "真原医", "静坐", "好睡"
- "超越百岁:长寿的科学与艺术" → tries "超越百岁", then "超越百岁 彼得·阿提亚"
Format Selection
| Flag | Description |
|---|
| INLINECODE12 | PDF only (default, preferred for NotebookLM) |
| INLINECODE13 |
EPUB only |
|
--format mobi | MOBI only |
|
--format azw3 | AZW3 only |
|
--format any | Accept any available format |
Batch Download
CODEBLOCK9
JSON format:
CODEBLOCK10
Features: resume via _progress.json, skip existing, rate limiting.
Troubleshooting
| Problem | Solution |
|---|
| IP blocking | Use browser tool, not web_fetch |
| Link 404 |
Link expired, re-search |
| API non-200 | Re-navigate and re-decrypt |
| Download is HTML | URL expired, fresh API call needed |
| ZIP filenames garbled | Use Python cp437→gbk, not unzip |
| Timeout on large files | Increase
--max-time to 1200 |
| Anna's Archive blocked | Try different mirror, use
anna_iso_batch.sh |
中文电子书下载器
从多个来源下载中文电子书,支持自动回退和格式转换。
快速开始
bash
单本书下载(多源回退)
python scripts/download_book.py --title 超越百岁 --author 彼得·阿提亚
多源批量下载(A→B→C回退 + EPUB→PDF转换)
python scripts/multi
sourcedownload.py ~/Books/
直接搜索安娜的档案
python scripts/search
sourcec.py 书名 作者
将EPUB转换为PDF
python scripts/epub
topdf.py book.epub book.pdf
下载来源(优先级顺序)
| 来源 | 覆盖范围 | 限制 | 备注 |
|---|
| 来源A(在线图书库) | ~100% | 无 | 主要来源——热门中文书籍覆盖率高 |
| 来源B(辅助库) |
~8% | 无 | 缺失书籍的回退方案 |
|
来源C(安娜的档案) | 广泛 | 速率受限 | 最后手段——使用libgen.li镜像 |
注意: Z-Library因每日10本下载限制已弃用。
多源回退
multisourcedownload.py脚本自动按顺序尝试来源:
来源A → 来源B → 来源C → EPUB→PDF转换
每本书的工作流程:
- 1. 尝试来源A(ZIP → 提取PDF/EPUB)
- 若失败,尝试来源B(文件托管下载)
- 若失败,尝试来源C(通过libgen.li访问安娜的档案)
- 若仅找到EPUB,使用weasyprint自动转换为PDF
使用方法:
bash
编辑脚本中的BOOKS列表,然后运行:
python scripts/multi
sourcedownload.py ~/Books/
EPUB → PDF转换
当仅有EPUB格式可用时,使用weasyprint自动转换:
bash
单个文件
python scripts/epub
topdf.py input.epub output.pdf
批量转换目录
python scripts/epub
topdf.py --batch ~/Books/
依赖要求: ebooklib、weasyprint、已安装CJK字体。
脚本参考
| 脚本 | 用途 |
|---|
| downloadbook.py | 从来源A主下载 |
| searchsecondary_source.py |
来源B搜索与下载 |
| search
sourcec.py | 安娜的档案搜索与下载 |
| batch_download.py | 从JSON列表批量下载 |
| multi
sourcedownload.py | 多源A→B→C回退 |
| epub
topdf.py | EPUB/MOBI转PDF转换 |
| anna
isobatch.sh | 安娜的档案隔离批处理(每本书一个进程) |
来源A工作流程(主要)
搜索 → 获取文件托管链接 → 解密 → 等待倒计时 → API获取 → curl下载 → 解压ZIP
步骤1:搜索
在主库中搜索书名。导航至下载页面,提取文件托管URL和密码。
步骤2:解密
导航至文件托管URL,输入密码,点击解密。
步骤3:等待倒计时
文件托管服务需要倒计时才能下载。
请勿跳过。
步骤4:获取真实下载URL
获取页面变量:
javascript
JSON.stringify({apiserver, userid, fileid, shareid, filechk, starttime, waitseconds, verifycode})
调用API:
javascript
(async () => {
var url = apiserver + /getfile_url.php?uid= + userid
+ &fid= + fileid + &folderid=0&shareid= + shareid
+ &filechk= + filechk + &starttime= + starttime
+ &waitseconds= + waitseconds + &mb=0&app=0&acheck=0
+ &verifycode= + verifycode + &rd= + Math.random();
var headers = typeof getAjaxHeaders === function ? getAjaxHeaders() : {};
var resp = await fetch(url, {headers: headers});
return JSON.stringify(await resp.json());
})()
响应code: 200 → downurl为真实URL。
步骤5:下载
bash
curl -L -o book.zip DOWNURL \
-H User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10
157) \
--max-time 1200
步骤6:解压ZIP(GBK编码)
python
import zipfile
with zipfile.ZipFile(book.zip, r) as z:
for info in z.infolist():
try:
name = info.filename.encode(cp437).decode(gbk)
except:
name = info.filename
ext = os.path.splitext(name)[1].lower()
if ext in (.epub, .azw3, .mobi, .pdf, .txt):
data = z.read(info.filename)
with open(os.path.basename(name), wb) as f:
f.write(data)
书名匹配策略
当书名较长或包含多个名称时(例如套装):
- - 移除副标题(:或:之后的内容)
- 移除括号内容((...)、(...))
- 移除套装共X册套装描述
- 将+连接的标题拆分为单本书
- 依次尝试每个关键词直至匹配成功
- 回退至完整标题+作者
示例:
- - 杨定一全部生命系列:真原医+静坐+好睡(套装3册) → 尝试真原医、静坐、好睡
- 超越百岁:长寿的科学与艺术 → 尝试超越百岁,然后超越百岁 彼得·阿提亚
格式选择
| 标志 | 描述 |
|---|
| --format pdf | 仅PDF(默认,NotebookLM首选) |
| --format epub |
仅EPUB |
| --format mobi | 仅MOBI |
| --format azw3 | 仅AZW3 |
| --format any | 接受任何可用格式 |
批量下载
bash
python scripts/batch_download.py --book-list books.json --output-dir ~/Books/
JSON格式:
json
[
{title: 超越百岁, file_url: <文件托管URL>, password: <密码>}
]
功能:通过_progress.json断点续传、跳过已有文件、速率限制。
故障排除
| 问题 | 解决方案 |
|---|
| IP被封锁 | 使用浏览器工具,而非web_fetch |
| 链接404 |
链接已过期,重新搜索 |
| API非200 | 重新导航并重新解密 |
| 下载内容为HTML | URL已过期,需重新调用API |
| ZIP文件名乱码 | 使用Python cp437→gbk,而非unzip |
| 大文件超时 | 将--max-time增加至1200 |
| 安娜的档案被封锁 | 尝试不同镜像,使用anna
isobatch.sh |