Web to Markdown

概述 / Overview

通用网页抓取工具，支持：
A general-purpose web scraping tool that supports:

- 将网页内容转换为干净的 Markdown / Converting web content to clean Markdown
从任意网站提取图片 URL / Extracting image URLs from any website
批量下载网页图片 / Batch downloading images from web pages

适用于内容阅读、图片收集、资料整理等场景。
Suitable for content reading, image collection, and data organization.

功能模块 / Features

1. 网页转 Markdown / Web to Markdown

将网页 URL 转换为干净的 Markdown 文本，移除广告、导航栏等无关内容。
Converts a web page URL into clean Markdown text, removing ads, navigation bars, and other irrelevant content.

URL 前缀服务 / URL Prefix Services：

服务 Service	前缀 Prefix	特点 Notes
markdown.new	INLINECODE0	首选，速度快 / Preferred, fast
defuddle

使用 / Usage：
CODEBLOCK0

2. 提取网页图片 / Extract Images from Web Pages

从任意网页提取所有图片 URL。
Extracts all image URLs from any web page.

通用提取 / General Extraction：
CODEBLOCK1

使用脚本 / Using the Script：
CODEBLOCK2

3. 批量下载图片 / Batch Download Images

从网页提取图片并批量下载到本地。
Extracts images from web pages and downloads them in batch to local storage.

使用脚本 / Using the Script：
CODEBLOCK3

参数 / Parameters：

- url: 网页 URL / Web page URL
INLINECODE4: 输出目录（默认 ~/.openclaw/images）/ Output directory (default: ~/.openclaw/images)
INLINECODE7: 最大下载数（默认 50）/ Max downloads (default: 50)
INLINECODE8: 最小文件大小，过滤小图标（默认 10KB）/ Min file size to filter out small icons (default: 10KB)
INLINECODE9: 只下载指定格式（jpg/png/gif/webp）/ Only download specific formats (jpg/png/gif/webp)

示例 / Examples：
CODEBLOCK4

工作流程 / Workflow

网页内容抓取 / Web Content Scraping

1. 首选 markdown.new/ / Prefer INLINECODE11
失败则尝试 defuddle.md/ / Fall back to INLINECODE13
再失败尝试 r.jina.ai/ / Then try INLINECODE15
最终使用本地 Scrapling 脚本 / Finally use local Scrapling script

图片提取下载 / Image Extraction & Download

1. 使用 r.jina.ai 获取网页内容 / Use r.jina.ai to fetch page content
正则提取所有图片 URL / Extract all image URLs via regex
过滤小图片（图标、表情等）/ Filter out small images (icons, emojis, etc.)
智能命名并下载保存 / Smart naming and download

特殊网站支持 / Special Website Support

自动识别 Pinterest URL，将缩略图转换为原始尺寸：
Automatically detects Pinterest URLs and converts thumbnails to original size:

- 236x → INLINECODE19
INLINECODE20 → INLINECODE21

其他常见网站 / Other Common Websites

脚本会自动处理各种网站的图片 URL 格式，包括：
The scripts automatically handle various image URL formats, including:

- CDN 链接 / CDN links
带参数的 URL / URLs with query parameters
懒加载图片 / Lazy-loaded images

脚本说明 / Script Reference

scripts/scrape.py

本地网页抓取脚本，作为在线服务的降级方案。
Local web scraping script, used as a fallback for online services.

CODEBLOCK5

scripts/extract_images.py

提取网页中的图片 URL，输出为列表。
Extracts image URLs from a web page and outputs them as a list.

CODEBLOCK6

scripts/download_images.py

批量下载网页图片。
Batch downloads images from a web page.

CODEBLOCK7

依赖 / Dependencies

INLINECODE22 和 download_images.py 仅使用 Python 标准库，无需额外安装。
extract_images.py and download_images.py only use the Python standard library — no extra installation needed.

INLINECODE26 需要安装 scrapling（本地抓取降级方案）：
scrape.py requires scrapling (local scraping fallback):

CODEBLOCK8

注意事项 / Notes

- 遵守网站的 robots.txt 和使用条款 / Respect the website's robots.txt and terms of use
大量下载前考虑网站服务器压力 / Consider server load before mass downloading
部分网站有防盗链，可能无法直接下载 / Some sites have hotlink protection and may block direct downloads
动态加载的图片可能需要使用 r.jina.ai / Dynamically loaded images may require INLINECODE31

Web to Markdown

概述 / Overview

通用网页抓取工具，支持：
A general-purpose web scraping tool that supports:

- 将网页内容转换为干净的 Markdown / Converting web content to clean Markdown
从任意网站提取图片 URL / Extracting image URLs from any website
批量下载网页图片 / Batch downloading images from web pages

适用于内容阅读、图片收集、资料整理等场景。
Suitable for content reading, image collection, and data organization.

功能模块 / Features

1. 网页转 Markdown / Web to Markdown

URL 前缀服务 / URL Prefix Services：

服务 Service	前缀 Prefix	特点 Notes
markdown.new	https://markdown.new/	首选，速度快 / Preferred, fast
defuddle

使用 / Usage：
bash
curl -s https://markdown.new/https://example.com/article
curl -s https://r.jina.ai/https://example.com/article

2. 提取网页图片 / Extract Images from Web Pages

从任意网页提取所有图片 URL。
Extracts all image URLs from any web page.

通用提取 / General Extraction：
bash

提取所有图片 URL / Extract all image URLs

使用脚本 / Using the Script：
bash
python scripts/extract_images.py [--output urls.txt]

3. 批量下载图片 / Batch Download Images

从网页提取图片并批量下载到本地。
Extracts images from web pages and downloads them in batch to local storage.

使用脚本 / Using the Script：
bash
python scripts/download_images.py [--output

] [--limit ] [--min-size ]

参数 / Parameters：

- url: 网页 URL / Web page URL
--output: 输出目录（默认 ~/.openclaw/images）/ Output directory (default: ~/.openclaw/images)
--limit: 最大下载数（默认 50）/ Max downloads (default: 50)
--min-size: 最小文件大小，过滤小图标（默认 10KB）/ Min file size to filter out small icons (default: 10KB)
--ext: 只下载指定格式（jpg/png/gif/webp）/ Only download specific formats (jpg/png/gif/webp)

示例 / Examples：
bash

下载网页中的所有大图 / Download all large images from a page

python scripts/download_images.py https://example.com/gallery --output ~/Downloads/images

只下载 PNG，最多 20 张 / Download only PNGs, max 20

python scripts/download_images.py https://example.com --ext png --limit 20

Pinterest（自动转换原始尺寸）/ Pinterest (auto-converts to original size)

python scripts/download_images.py https://www.pinterest.com/search/pins/?q=architecture

工作流程 / Workflow

网页内容抓取 / Web Content Scraping

1. 首选 markdown.new/ / Prefer markdown.new/
失败则尝试 defuddle.md/ / Fall back to defuddle.md/
再失败尝试 r.jina.ai/ / Then try r.jina.ai/
最终使用本地 Scrapling 脚本 / Finally use local Scrapling script

图片提取下载 / Image Extraction & Download

1. 使用 r.jina.ai 获取网页内容 / Use r.jina.ai to fetch page content
正则提取所有图片 URL / Extract all image URLs via regex
过滤小图片（图标、表情等）/ Filter out small images (icons, emojis, etc.)
智能命名并下载保存 / Smart naming and download

特殊网站支持 / Special Website Support

自动识别 Pinterest URL，将缩略图转换为原始尺寸：
Automatically detects Pinterest URLs and converts thumbnails to original size:

- 236x → originals
564x → originals

其他常见网站 / Other Common Websites

脚本会自动处理各种网站的图片 URL 格式，包括：
The scripts automatically handle various image URL formats, including:

- CDN 链接 / CDN links
带参数的 URL / URLs with query parameters
懒加载图片 / Lazy-loaded images

脚本说明 / Script Reference

scripts/scrape.py

本地网页抓取脚本，作为在线服务的降级方案。
Local web scraping script, used as a fallback for online services.

bash
python scripts/scrape.py

scripts/extract_images.py

提取网页中的图片 URL，输出为列表。
Extracts image URLs from a web page and outputs them as a list.

bash
python scripts/extract_images.py [--output urls.txt]

scripts/download_images.py

批量下载网页图片。
Batch downloads images from a web page.

bash
python scripts/download_images.py [options]

依赖 / Dependencies

extractimages.py 和 downloadimages.py 仅使用 Python 标准库，无需额外安装。
extractimages.py and downloadimages.py only use the Python standard library — no extra installation needed.

scrape.py 需要安装 scrapling（本地抓取降级方案）：
scrape.py requires scrapling (local scraping fallback):

bash
pip install scrapling

注意事项 / Notes

- 遵守网站的 robots.txt 和使用条款 / Respect the websites robots.txt and terms of use
大量下载前考虑网站服务器压力 / Consider server load before mass downloading
部分网站有防盗链，可能无法直接下载 / Some sites have hotlink protection and may block direct downloads
动态加载的图片可能需要使用 r.jina.ai / Dynamically loaded images may require r.jina.ai

web-to-markdown网页转Markdown

web-to-markdown

Web to Markdown

概述 / Overview

功能模块 / Features

1. 网页转 Markdown / Web to Markdown

2. 提取网页图片 / Extract Images from Web Pages

3. 批量下载图片 / Batch Download Images

工作流程 / Workflow

网页内容抓取 / Web Content Scraping

图片提取下载 / Image Extraction & Download

特殊网站支持 / Special Website Support

Pinterest

其他常见网站 / Other Common Websites

脚本说明 / Script Reference

scripts/scrape.py

scripts/extract_images.py

scripts/download_images.py

依赖 / Dependencies

注意事项 / Notes

Web to Markdown

概述 / Overview

功能模块 / Features

1. 网页转 Markdown / Web to Markdown

2. 提取网页图片 / Extract Images from Web Pages

提取所有图片 URL / Extract all image URLs

3. 批量下载图片 / Batch Download Images

下载网页中的所有大图 / Download all large images from a page

只下载 PNG，最多 20 张 / Download only PNGs, max 20

Pinterest（自动转换原始尺寸）/ Pinterest (auto-converts to original size)

工作流程 / Workflow

网页内容抓取 / Web Content Scraping

图片提取下载 / Image Extraction & Download

特殊网站支持 / Special Website Support

Pinterest

其他常见网站 / Other Common Websites

脚本说明 / Script Reference

scripts/scrape.py

scripts/extract_images.py

scripts/download_images.py

依赖 / Dependencies

注意事项 / Notes

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement