返回顶部
b

browser-scraper

Scrape websites using a real Chrome browser with the user's Chrome profile — shares cookies, auth, and fingerprint to bypass bot detection (Cloudflare, Reddit, etc.). Use when scraping sites that block headless browsers or require login, or when asked to "open a browser and scrape", "take a screenshot of a page", "get data from a site that blocks bots", or "scrape with a specific Chrome profile".

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
73
下载量
0
收藏
概述
安装方式
版本历史

browser-scraper

# Browser Scraper Scrapes web pages using Playwright with a real Chrome/Chromium binary and an existing user profile. Bypasses bot detection by sharing existing cookies, fingerprint, and session. ## Profiles The scraper supports multiple Chrome profiles: - **Default (no `--profile` flag):** Uses the system's default Chrome profile - macOS: `~/Library/Application Support/Google/Chrome/Default` - Linux: `~/.config/google-chrome/Default` - Windows: `%LOCALAPPDATA%\Google\Chrome\User Data\Default` - **Named profile (`--profile <name>`):** Uses `profiles/<name>/` under the skill directory - Create a profile by launching Chrome with `--profile-directory=Profile 1` or similar, then point the scraper at that folder - Useful for: isolating logins, avoiding conflicts with your main Chrome session, scraping without auth ## Script ```bash # Default profile (system Chrome) node scripts/scrape.mjs <url> [css_selector] # Named profile (profiles/<name>/) node scripts/scrape.mjs <url> [css_selector] --profile <name> # Headless mode (faster, higher block risk) node scripts/scrape.mjs <url> --headless --profile <name> # Keep browser open after scraping (for interactive use) node scripts/scrape.mjs <url> --profile <name> --keep-open # Extra wait for lazy-loaded content (default: 3000ms) node scripts/scrape.mjs <url> --profile <name> --wait 6000 ``` Run from the skill directory: ```bash cd ~/.openclaw-yekeen/workspace/skills/browser-scraper/ node scripts/scrape.mjs https://www.reddit.com/ ``` ## Output - JSON to stdout: matched elements or page preview - Screenshot saved to `/tmp/browser-scraper-last.png` ## Key Design - `channel: 'chrome'` — launches real Chrome when available, falls back to system Chromium - `launchPersistentContext` with the profile directory - `--disable-blink-features=AutomationControlled` + `navigator.webdriver` patch - `headless: false` by default to avoid SingletonLock conflicts ## Requirements - [Playwright](https://playwright.dev) installed: `npm install playwright` - Chrome or Chromium installed on the system - On macOS/Linux: the `channel: 'chrome'` option requires Chrome (not Chromium) to be installed ## Tips - Chrome must not already be open with the target profile (SingletonLock error). Close Chrome first, or use a named profile to avoid conflicts. - If you get a `SingletonLock` error with a named profile, delete the `SingletonLock` file in that profile directory and try again. - Use `--keep-open` to leave the browser open for interactive use after scraping — Ctrl+C to close. - For sites with lazy-loaded content: use `--wait <ms>` flag or modify the script to increase `waitForTimeout` - For Reddit: use selector `shreddit-post` and read attributes (`post-title`, `author`, `score`, `permalink`) - To create a fresh isolated profile: run Chrome from the terminal with `--profile-directory=Profile X` and log in, then point the scraper at that directory

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 browser-scraper-1775936265 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 browser-scraper-1775936265 技能

通过命令行安装

skillhub install browser-scraper-1775936265

下载 Zip 包

⬇ 下载 browser-scraper v1.0.0

文件大小: 6.37 KB | 发布时间: 2026-4-12 09:14

v1.0.0 最新 2026-4-12 09:14
Initial release of browser-scraper.

- Enables scraping of websites using a real Chrome browser and user Chrome profile to bypass bot detection and access authenticated content.
- Supports both default system Chrome profiles and custom named profiles for isolated sessions.
- Offers optional features: headless mode, adjustable wait times for dynamic content, and interactive mode keeping the browser open.
- Outputs extracted data as JSON and saves page screenshots.
- Requires Playwright and a local Chrome/Chromium installation.
- Includes troubleshooting and usage tips for avoiding profile/lock conflicts and improving scrape results.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部