Browser-Based Flight Search

USE BROWSER ONLY WHEN:

- User specifically asks to check Google Flights / Kayak / Skyscanner
API search returned no results and user wants broader coverage
Price comparison requested against external sources

Browser Commands (OpenClaw Playwright-on-CDP with ARIA Snapshots)

- browser snapshot — get ARIA tree with [ref=eN] element references
INLINECODE1 — type into an input field
INLINECODE2 — click an element
INLINECODE3 — capture current page state

Google Flights

1. Navigate to INLINECODE4
INLINECODE5 → ARIA tree
Fill origin, destination, dates using [ref] locators
Click search, wait 5s for results
INLINECODE6 → extract airlines, prices, durations, stops
Enrich with jetlag scores via POST /api/v1/flights/score before presenting

Kayak

1. Navigate to INLINECODE7
Same snapshot → fill → search → extract pattern

Skyscanner

1. Navigate to INLINECODE8
Same snapshot → fill → search → extract pattern

Always

- Present browser results as "comparison data" — recommend booking through our API
Random delays (3-8s) between browser actions
Max 10 browser searches per day per user
Enrich all results with jetlag scores before presenting to user

Rate Limits

- Max 10 browser flight searches per day per user.
Back off 24 hours if any site blocks the browser.

Rate Limit Tracking

Track all browser searches in workspace file ~/browser-searches.json:

CODEBLOCK0

Before each browser search:

1. Read ~/browser-searches.json (create if missing)
If date differs from today, reset count to 0 and clear INLINECODE13
If blockedUntil is set and in the future, refuse — tell user blocked by site
If count >= 10, refuse — tell user daily browser search limit reached
After each search, increment count and append to INLINECODE17
If a site blocks the browser, set blockedUntil to 24 hours from now

Browser Best Practices

Context Selection

DIRECT (no proxy): Google Flights, Kayak, Booking.com, Google Hotels, Lufthansa SCRAPLING (stealth service, no proxy needed): Delta, British Airways, SecretFlying, seats.aero, Southwest, Hilton, Hyatt, TripAdvisor, TheFlightDeal, Going, SeatGuru, Google Travel (flights + hotels) PROXY (residential): United, American Airlines, Air Canada, KLM, TravelPirates SKIP BROWSER (use API): - Hotel search → LiteAPI first, browser for enrichment only - Deal discovery → Aerobase Deals API first, browser for verification only - Tours/activities → Aerobase Tours API first, browser rarely needed - Flight pricing → Amadeus/Kiwi API, browser for visual comparison - Award search → seats.aero API, browser for airline-specific lookups

Scrapling Service (Anti-Bot Bypass)

When browser automation is blocked by anti-bot systems (Akamai, Cloudflare, Datadome, etc.),
use the stealth scrapling service configured via SCRAPLING_URL environment variable.
This service bypasses detection WITHOUT needing residential proxies.

Reference: Scrapling Documentation

When to use Scrapling:

- Site shows reCAPTCHA, "Access denied", or challenge page
Normal browser is blocked or redirected
Need to extract data from JS-heavy sites

How to invoke:

Fetch a page (returns JSON with status, title, HTML, challenge detection):
CODEBLOCK1

Run JavaScript on a page:
CODEBLOCK2

Check service health:
CODEBLOCK3

Response fields:

- status: HTTP status code (200 = success)
INLINECODE21: Page title
INLINECODE22: "pass" | "captcha" | "blocked" | "challenge"
INLINECODE23: true if served from 5-min cache
INLINECODE24: Page HTML (truncated to 50KB in JSON mode)
INLINECODE25: Full HTML length

Fallback chain:

1. Try Scrapling service first for listed domains
If challenge != "pass": fall back to native browser + residential proxy
If proxy also fails: screenshot and tell user

Important: Scrapling responses are cached for 5 minutes. For time-sensitive
data (live prices, seat maps), append &nocache=1 or wait for cache expiry.

Aggregator Search (Scrapling /search)

Pre-built search + Python-side parsing. Returns structured JSON — no browser
snapshot/type/click needed. Results are parsed server-side via Scrapling's
Adaptor engine (CSS selectors, find_similar for self-healing).

Google Flights:

POST {SCRAPLING_URL}/search
{"site":"google-flights","origin":"LAX","destination":"NRT","departure":"2026-03-15","return":"2026-03-22"}

Returns: INLINECODE27

Kayak:
CODEBLOCK5

Booking.com hotels:

POST {SCRAPLING_URL}/search
{"site":"booking","destination":"Tokyo","checkin":"2026-03-15","checkout":"2026-03-22","guests":2}

Returns: INLINECODE28

Deal sites:

POST {SCRAPLING_URL}/search
{"site":"secretflying"}
POST {SCRAPLING_URL}/search
{"site":"theflightdeal"}

Returns: INLINECODE29

Check challenge field — if not "pass", results may be incomplete (consent wall, bot block).

Multi-Step Interaction (Scrapling /interact)

For flows needing form fill, click, screenshot (check-in, login, registration):

CODEBLOCK8

Available actions:

- consent — auto-dismiss cookie consent walls
INLINECODE32 — fill input by CSS selector (instant, like paste)
INLINECODE33 — type with per-key delay (more human-like, use for sensitive fields)
INLINECODE34 — click element by CSS selector
INLINECODE35 — wait N milliseconds
INLINECODE36 — wait for selector to appear (with timeout)
INLINECODE37 — capture current page (returned as base64 in screenshots array)
INLINECODE39 — parse page with CSS selector (results in extracted array)
INLINECODE41 — select dropdown option

Fetch with Screenshot or CSS Extraction

CODEBLOCK9

Search + Book Pattern

1. Fire API search (Kiwi/Duffel) immediately — don't wait for browser
Fire Scrapling /search in parallel for comparison data
Show API results first (faster, <2s)
Merge Scrapling results: "Google Flights also shows..." / "Kayak prices..."
For booking: use API (Duffel hold → user confirms → API completes)
For airline-direct booking: navigate user to airline site via VNC
NEVER automate payment card entry via browser

Booking Flow

- API booking (Duffel/Kiwi): Agent can search, hold, and complete with user approval
Browser booking: Navigate to site, user completes payment via VNC
NEVER automate payment card entry via browser (PCI compliance, 3D Secure blocks)
For held bookings: confirm with user before paying (Duffel supports 24-72h holds)

API-First + Browser-Concurrent Pattern

For any task where we have an API:

1. Fire API request immediately (don't wait for browser)
Show API results to user as they arrive
Launch browser concurrently if enrichment would help
Merge browser findings: "I also found..." / "For comparison..."
Highlight discrepancies between API and browser data

This gives the user instant results + richer context seconds later.

Launch Checklist

1. Stealth plugin is auto-loaded — no action needed
Choose direct or proxy context based on target domain
Set viewport 1440x900, locale en-US, timezone America/New_York
Set 30s default timeout for navigation
ALWAYS register error handler: page.on('pageerror', ...)

Memory Management (CRITICAL)

- Chrome watchdog kills process at 1800MB RSS
Max 2 concurrent tabs safely (tested: 3 tabs = 1795MB = danger zone)
ALWAYS close context after task: await context.close()
Prefer sequential tabs over concurrent
If opening multiple tabs: close each before opening next
Monitor with: process.memoryUsage().rss

Cookie Consent (EU server — Helsinki)

Scrapling service handles consent dismissal automatically via page_action. For native browser, patterns to try in order:

1. button:has-text("Reject all")
button:has-text("Decline")
button:has-text("Alle ablehnen")
button:has-text("I decline")
[data-testid="reject-button"]
button:has-text("Manage") → then "Reject all" in second dialog

Timeout: 5s for consent dialog, then proceed (some sites don't show it)

Bot Detection Response

If you see any of these, you're being blocked:

- reCAPTCHA iframe or badge
"Please verify you are a human"
"Access denied" / "403 Forbidden"
Datadome challenge page
Blank page with Cloudflare "checking your browser"
"Pardon our interruption" (Akamai)

Response:

1. If domain is in Scrapling list: try Scrapling service first (no proxy cost)
If Scrapling returns challenge != "pass": fall back to native browser + PROXY
If on DIRECT: retry with PROXY context
If already on PROXY: screenshot and fallback to alternative site
Tell user: "I'm seeing a verification on [site]. Let me try [alternative]."
NEVER attempt to solve CAPTCHAs
Max 2 retries per site per session

Screenshot Best Practices

- Full page: page.screenshot({ fullPage: true }) — use for results
Viewport only: page.screenshot() — use for errors/blocks
Element: element.screenshot() — use for specific data extraction
Always save to /tmp/ with descriptive name
Offer to show screenshots to user when relevant

Geo-Awareness

Server is in Helsinki, Finland (EU). This means:

- Airline sites redirect to EU versions (/eu/en, .de, etc.)
Prices show in EUR by default on many sites
Cookie consent walls appear on almost every site
Some US-only features/deals may not be accessible
With US residential proxy: sites see US IP, show USD, US content

Performance Targets

- Page load: <10s acceptable, <5s ideal
Search results: <15s acceptable
Check-in form: <10s
If exceeding 30s: abort, screenshot, try alternative

基于浏览器的航班搜索

仅在以下情况下使用浏览器：

- 用户明确要求查看 Google Flights / Kayak / Skyscanner
API 搜索未返回结果，用户希望获得更广泛的覆盖范围
需要与外部来源进行价格比较

浏览器命令（使用 ARIA 快照的 OpenClaw Playwright-on-CDP）

- browser snapshot — 获取带有 [ref=eN] 元素引用的 ARIA 树
browser type [ref=eN] value — 在输入字段中输入内容
browser click [ref=eN] — 点击元素
browser screenshot — 捕获当前页面状态

Google Flights

1. 导航至 https://www.google.com/travel/flights
browser snapshot → ARIA 树
使用 [ref] 定位器填写出发地、目的地和日期
点击搜索，等待 5 秒获取结果
browser snapshot → 提取航空公司、价格、时长、经停次数
在展示前通过 POST /api/v1/flights/score 添加时差评分

Kayak

1. 导航至 https://www.kayak.com
相同快照 → 填写 → 搜索 → 提取模式

Skyscanner

1. 导航至 https://www.skyscanner.com
相同快照 → 填写 → 搜索 → 提取模式

始终

- 将浏览器结果呈现为比较数据 — 建议通过我们的 API 预订
浏览器操作之间随机延迟（3-8秒）
每位用户每天最多 10 次浏览器搜索
在向用户展示前，为所有结果添加时差评分

速率限制

- 每位用户每天最多 10 次浏览器航班搜索。
如果任何网站屏蔽浏览器，则暂停 24 小时。

速率限制跟踪

在工作区文件 ~/browser-searches.json 中跟踪所有浏览器搜索：

json
{
date: 2026-02-22,
count: 3,
searches: [
{ site: google-flights, query: JFK-NRT 2026-03-15, timestamp: 2026-02-22T10:30:00Z }
],
blockedUntil: null
}

每次浏览器搜索前：

1. 读取 ~/browser-searches.json（如果不存在则创建）
如果 date 与今天不同，将 count 重置为 0 并清空 searches
如果 blockedUntil 已设置且在未来，则拒绝 — 告知用户被网站屏蔽
如果 count >= 10，则拒绝 — 告知用户已达到每日浏览器搜索限制
每次搜索后，增加 count 并追加到 searches
如果网站屏蔽浏览器，将 blockedUntil 设置为从现在起 24 小时后

浏览器最佳实践

上下文选择

直接（无代理）：Google Flights、Kayak、Booking.com、Google Hotels、汉莎航空 Scrapling（隐身服务，无需代理）：达美航空、英国航空、SecretFlying、 seats.aero、西南航空、希尔顿、凯悦、TripAdvisor、TheFlightDeal、Going、 SeatGuru、Google Travel（航班 + 酒店）代理（住宅）：美联航、美国航空、加拿大航空、荷兰皇家航空、TravelPirates 跳过浏览器（使用 API）： - 酒店搜索 → 先使用 LiteAPI，浏览器仅用于丰富信息 - 优惠发现 → 先使用 Aerobase Deals API，浏览器仅用于验证 - 旅游/活动 → 先使用 Aerobase Tours API，很少需要浏览器 - 航班定价 → Amadeus/Kiwi API，浏览器用于视觉比较 - 奖励搜索 → seats.aero API，浏览器用于特定航空公司查询

Scrapling 服务（反机器人绕过）

当浏览器自动化被反机器人系统（Akamai、Cloudflare、Datadome 等）屏蔽时，
使用通过 SCRAPLING_URL 环境变量配置的隐身 Scrapling 服务。
该服务无需住宅代理即可绕过检测。

参考：Scrapling 文档

何时使用 Scrapling：

- 网站显示 reCAPTCHA、访问被拒绝或验证页面
普通浏览器被屏蔽或重定向
需要从 JS 密集型网站提取数据

如何调用：

获取页面（返回包含状态、标题、HTML、验证检测的 JSON）：

webfetch {SCRAPLINGURL}/fetch?url=https://www.delta.com&json=1

在页面上运行 JavaScript：

POST {SCRAPLING_URL}/evaluate
Body: {url: https://seats.aero, script: document.title}

检查服务健康状态：

webfetch {SCRAPLINGURL}/health

响应字段：

- status：HTTP 状态码（200 = 成功）
title：页面标题
challenge：pass | captcha | blocked | challenge
cached：如果从 5 分钟缓存提供则为 true
html：页面 HTML（JSON 模式下截断至 50KB）
html_length：完整 HTML 长度

回退链：

1. 对于列出的域名，首先尝试 Scrapling 服务
如果 challenge != pass：回退到原生浏览器 + 住宅代理
如果代理也失败：截图并告知用户

重要提示： Scrapling 响应会缓存 5 分钟。对于时间敏感的
数据（实时价格、座位图），附加 &nocache=1 或等待缓存过期。

聚合器搜索（Scrapling /search）

预构建搜索 + Python 端解析。返回结构化 JSON — 无需浏览器
快照/输入/点击。结果通过 Scrapling 的适配器引擎（CSS 选择器、find_similar 自愈）在服务器端解析。

Google Flights：

POST {SCRAPLING_URL}/search
{site:google-flights,origin:LAX,destination:NRT,departure:2026-03-15,return:2026-03-22}

返回：{results: [{airline:...,price:...,duration:...,stops:...}], count: N}

Kayak：

POST {SCRAPLING_URL}/search
{site:kayak,origin:LAX,destination:NRT,departure:2026-03-15,return:2026-03-22}

Booking.com 酒店：

POST {SCRAPLING_URL}/search
{site:booking,destination:Tokyo,checkin:2026-03-15,checkout:2026-03-22,guests:2}

返回：{results: [{name:...,price:...,rating:...,location:...}], count: N}

优惠网站：

POST {SCRAPLING_URL}/search
{site:secretflying}
POST {SCRAPLING_URL}/search
{site:theflightdeal}

返回：{results: [{title:...,url:...}], count: N}

检查 challenge 字段 — 如果不是 pass，结果可能不完整（同意墙、机器人屏蔽）。

多步骤交互（Scrapling /interact）

对于需要填写表单、点击、截图的流程（值机、登录、注册）：

POST {SCRAPLING_URL}/interact
{
url: https://www.southwest.com/air/check-in/,
steps: [
{action: consent},
{action: fill, selector: #confirmationNumber, value: ABC123},
{action: fill, selector: #firstName, value: John},
{action: fill, selector: #lastName, value: Doe},
{action: click, selector: button#form-mixin--submit-button},
{action: wait, ms: 5000},
{action: screenshot},
{action: extract, css: h1::text}
]
}

可用操作：

- consent — 自动关闭 Cookie 同意墙
fill — 通过 CSS 选择器填写输入（即时，类似粘贴）
type — 逐键延迟输入（更人性化，用于敏感字段）
click — 通过 CSS 选择器点击元素
wait — 等待 N

aerobase-browser航空浏览器

aerobase-browser

Browser-Based Flight Search

Browser Commands (OpenClaw Playwright-on-CDP with ARIA Snapshots)

Google Flights

Kayak

Skyscanner

Always

Rate Limits

Rate Limit Tracking

Browser Best Practices

Context Selection

Scrapling Service (Anti-Bot Bypass)

Aggregator Search (Scrapling /search)

Multi-Step Interaction (Scrapling /interact)

Fetch with Screenshot or CSS Extraction

Search + Book Pattern

Booking Flow

API-First + Browser-Concurrent Pattern

Launch Checklist

Memory Management (CRITICAL)

Cookie Consent (EU server — Helsinki)

Bot Detection Response

Screenshot Best Practices

Geo-Awareness

Performance Targets

基于浏览器的航班搜索

浏览器命令（使用 ARIA 快照的 OpenClaw Playwright-on-CDP）

Google Flights

Kayak

Skyscanner

始终

速率限制

速率限制跟踪

浏览器最佳实践

上下文选择

Scrapling 服务（反机器人绕过）

聚合器搜索（Scrapling /search）

多步骤交互（Scrapling /interact）

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement