Browser-Based Flight Search
USE BROWSER ONLY WHEN:
- - User specifically asks to check Google Flights / Kayak / Skyscanner
- API search returned no results and user wants broader coverage
- Price comparison requested against external sources
Browser Commands (OpenClaw Playwright-on-CDP with ARIA Snapshots)
- -
browser snapshot — get ARIA tree with [ref=eN] element references - INLINECODE1 — type into an input field
- INLINECODE2 — click an element
- INLINECODE3 — capture current page state
Google Flights
- 1. Navigate to INLINECODE4
- INLINECODE5 → ARIA tree
- Fill origin, destination, dates using [ref] locators
- Click search, wait 5s for results
- INLINECODE6 → extract airlines, prices, durations, stops
- Enrich with jetlag scores via POST /api/v1/flights/score before presenting
Kayak
- 1. Navigate to INLINECODE7
- Same snapshot → fill → search → extract pattern
Skyscanner
- 1. Navigate to INLINECODE8
- Same snapshot → fill → search → extract pattern
Always
- - Present browser results as "comparison data" — recommend booking through our API
- Random delays (3-8s) between browser actions
- Max 10 browser searches per day per user
- Enrich all results with jetlag scores before presenting to user
Rate Limits
- - Max 10 browser flight searches per day per user.
- Back off 24 hours if any site blocks the browser.
Rate Limit Tracking
Track all browser searches in workspace file ~/browser-searches.json:
CODEBLOCK0
Before each browser search:
- 1. Read
~/browser-searches.json (create if missing) - If
date differs from today, reset count to 0 and clear INLINECODE13 - If
blockedUntil is set and in the future, refuse — tell user blocked by site - If
count >= 10, refuse — tell user daily browser search limit reached - After each search, increment
count and append to INLINECODE17 - If a site blocks the browser, set
blockedUntil to 24 hours from now
Browser Best Practices
Context Selection
DIRECT (no proxy): Google Flights, Kayak, Booking.com, Google Hotels, Lufthansa
SCRAPLING (stealth service, no proxy needed): Delta, British Airways, SecretFlying,
seats.aero, Southwest, Hilton, Hyatt, TripAdvisor, TheFlightDeal, Going,
SeatGuru, Google Travel (flights + hotels)
PROXY (residential): United, American Airlines, Air Canada, KLM, TravelPirates
SKIP BROWSER (use API):
- Hotel search → LiteAPI first, browser for enrichment only
- Deal discovery → Aerobase Deals API first, browser for verification only
- Tours/activities → Aerobase Tours API first, browser rarely needed
- Flight pricing → Amadeus/Kiwi API, browser for visual comparison
- Award search → seats.aero API, browser for airline-specific lookups
Scrapling Service (Anti-Bot Bypass)
When browser automation is blocked by anti-bot systems (Akamai, Cloudflare, Datadome, etc.),
use the stealth scrapling service configured via SCRAPLING_URL environment variable.
This service bypasses detection WITHOUT needing residential proxies.
Reference: Scrapling Documentation
When to use Scrapling:
- - Site shows reCAPTCHA, "Access denied", or challenge page
- Normal browser is blocked or redirected
- Need to extract data from JS-heavy sites
How to invoke:
Fetch a page (returns JSON with status, title, HTML, challenge detection):
CODEBLOCK1
Run JavaScript on a page:
CODEBLOCK2
Check service health:
CODEBLOCK3
Response fields:
- -
status: HTTP status code (200 = success) - INLINECODE21 : Page title
- INLINECODE22 : "pass" | "captcha" | "blocked" | "challenge"
- INLINECODE23 : true if served from 5-min cache
- INLINECODE24 : Page HTML (truncated to 50KB in JSON mode)
- INLINECODE25 : Full HTML length
Fallback chain:
- 1. Try Scrapling service first for listed domains
- If challenge != "pass": fall back to native browser + residential proxy
- If proxy also fails: screenshot and tell user
Important: Scrapling responses are cached for 5 minutes. For time-sensitive
data (live prices, seat maps), append &nocache=1 or wait for cache expiry.
Aggregator Search (Scrapling /search)
Pre-built search + Python-side parsing. Returns structured JSON — no browser
snapshot/type/click needed. Results are parsed server-side via Scrapling's
Adaptor engine (CSS selectors, find_similar for self-healing).
Google Flights:
POST {SCRAPLING_URL}/search
{"site":"google-flights","origin":"LAX","destination":"NRT","departure":"2026-03-15","return":"2026-03-22"}
Returns: INLINECODE27
Kayak:
CODEBLOCK5
Booking.com hotels:
POST {SCRAPLING_URL}/search
{"site":"booking","destination":"Tokyo","checkin":"2026-03-15","checkout":"2026-03-22","guests":2}
Returns: INLINECODE28
Deal sites:
POST {SCRAPLING_URL}/search
{"site":"secretflying"}
POST {SCRAPLING_URL}/search
{"site":"theflightdeal"}
Returns: INLINECODE29
Check challenge field — if not "pass", results may be incomplete (consent wall, bot block).
Multi-Step Interaction (Scrapling /interact)
For flows needing form fill, click, screenshot (check-in, login, registration):
CODEBLOCK8
Available actions:
- -
consent — auto-dismiss cookie consent walls - INLINECODE32 — fill input by CSS selector (instant, like paste)
- INLINECODE33 — type with per-key delay (more human-like, use for sensitive fields)
- INLINECODE34 — click element by CSS selector
- INLINECODE35 — wait N milliseconds
- INLINECODE36 — wait for selector to appear (with timeout)
- INLINECODE37 — capture current page (returned as base64 in
screenshots array) - INLINECODE39 — parse page with CSS selector (results in
extracted array) - INLINECODE41 — select dropdown option
Fetch with Screenshot or CSS Extraction
CODEBLOCK9
Search + Book Pattern
- 1. Fire API search (Kiwi/Duffel) immediately — don't wait for browser
- Fire Scrapling
/search in parallel for comparison data - Show API results first (faster, <2s)
- Merge Scrapling results: "Google Flights also shows..." / "Kayak prices..."
- For booking: use API (Duffel hold → user confirms → API completes)
- For airline-direct booking: navigate user to airline site via VNC
- NEVER automate payment card entry via browser
Booking Flow
- - API booking (Duffel/Kiwi): Agent can search, hold, and complete with user approval
- Browser booking: Navigate to site, user completes payment via VNC
- NEVER automate payment card entry via browser (PCI compliance, 3D Secure blocks)
- For held bookings: confirm with user before paying (Duffel supports 24-72h holds)
API-First + Browser-Concurrent Pattern
For any task where we have an API:
- 1. Fire API request immediately (don't wait for browser)
- Show API results to user as they arrive
- Launch browser concurrently if enrichment would help
- Merge browser findings: "I also found..." / "For comparison..."
- Highlight discrepancies between API and browser data
This gives the user instant results + richer context seconds later.
Launch Checklist
- 1. Stealth plugin is auto-loaded — no action needed
- Choose direct or proxy context based on target domain
- Set viewport 1440x900, locale en-US, timezone America/New_York
- Set 30s default timeout for navigation
- ALWAYS register error handler: page.on('pageerror', ...)
Memory Management (CRITICAL)
- - Chrome watchdog kills process at 1800MB RSS
- Max 2 concurrent tabs safely (tested: 3 tabs = 1795MB = danger zone)
- ALWAYS close context after task: await context.close()
- Prefer sequential tabs over concurrent
- If opening multiple tabs: close each before opening next
- Monitor with: process.memoryUsage().rss
Cookie Consent (EU server — Helsinki)
Scrapling service handles consent dismissal automatically via
page_action.
For native browser, patterns to try in order:
- 1. button:has-text("Reject all")
- button:has-text("Decline")
- button:has-text("Alle ablehnen")
- button:has-text("I decline")
- [data-testid="reject-button"]
- button:has-text("Manage") → then "Reject all" in second dialog
Timeout: 5s for consent dialog, then proceed (some sites don't show it)
Bot Detection Response
If you see any of these, you're being blocked:
- - reCAPTCHA iframe or badge
- "Please verify you are a human"
- "Access denied" / "403 Forbidden"
- Datadome challenge page
- Blank page with Cloudflare "checking your browser"
- "Pardon our interruption" (Akamai)
Response:
- 1. If domain is in Scrapling list: try Scrapling service first (no proxy cost)
- If Scrapling returns challenge != "pass": fall back to native browser + PROXY
- If on DIRECT: retry with PROXY context
- If already on PROXY: screenshot and fallback to alternative site
- Tell user: "I'm seeing a verification on [site]. Let me try [alternative]."
- NEVER attempt to solve CAPTCHAs
- Max 2 retries per site per session
Screenshot Best Practices
- - Full page: page.screenshot({ fullPage: true }) — use for results
- Viewport only: page.screenshot() — use for errors/blocks
- Element: element.screenshot() — use for specific data extraction
- Always save to /tmp/ with descriptive name
- Offer to show screenshots to user when relevant
Geo-Awareness
Server is in Helsinki, Finland (EU). This means:
- - Airline sites redirect to EU versions (/eu/en, .de, etc.)
- Prices show in EUR by default on many sites
- Cookie consent walls appear on almost every site
- Some US-only features/deals may not be accessible
- With US residential proxy: sites see US IP, show USD, US content
Performance Targets
- - Page load: <10s acceptable, <5s ideal
- Search results: <15s acceptable
- Check-in form: <10s
- If exceeding 30s: abort, screenshot, try alternative
基于浏览器的航班搜索
仅在以下情况下使用浏览器:
- - 用户明确要求查看 Google Flights / Kayak / Skyscanner
- API 搜索未返回结果,用户希望获得更广泛的覆盖范围
- 需要与外部来源进行价格比较
浏览器命令(使用 ARIA 快照的 OpenClaw Playwright-on-CDP)
- - browser snapshot — 获取带有 [ref=eN] 元素引用的 ARIA 树
- browser type [ref=eN] value — 在输入字段中输入内容
- browser click [ref=eN] — 点击元素
- browser screenshot — 捕获当前页面状态
Google Flights
- 1. 导航至 https://www.google.com/travel/flights
- browser snapshot → ARIA 树
- 使用 [ref] 定位器填写出发地、目的地和日期
- 点击搜索,等待 5 秒获取结果
- browser snapshot → 提取航空公司、价格、时长、经停次数
- 在展示前通过 POST /api/v1/flights/score 添加时差评分
Kayak
- 1. 导航至 https://www.kayak.com
- 相同快照 → 填写 → 搜索 → 提取模式
Skyscanner
- 1. 导航至 https://www.skyscanner.com
- 相同快照 → 填写 → 搜索 → 提取模式
始终
- - 将浏览器结果呈现为比较数据 — 建议通过我们的 API 预订
- 浏览器操作之间随机延迟(3-8秒)
- 每位用户每天最多 10 次浏览器搜索
- 在向用户展示前,为所有结果添加时差评分
速率限制
- - 每位用户每天最多 10 次浏览器航班搜索。
- 如果任何网站屏蔽浏览器,则暂停 24 小时。
速率限制跟踪
在工作区文件 ~/browser-searches.json 中跟踪所有浏览器搜索:
json
{
date: 2026-02-22,
count: 3,
searches: [
{ site: google-flights, query: JFK-NRT 2026-03-15, timestamp: 2026-02-22T10:30:00Z }
],
blockedUntil: null
}
每次浏览器搜索前:
- 1. 读取 ~/browser-searches.json(如果不存在则创建)
- 如果 date 与今天不同,将 count 重置为 0 并清空 searches
- 如果 blockedUntil 已设置且在未来,则拒绝 — 告知用户被网站屏蔽
- 如果 count >= 10,则拒绝 — 告知用户已达到每日浏览器搜索限制
- 每次搜索后,增加 count 并追加到 searches
- 如果网站屏蔽浏览器,将 blockedUntil 设置为从现在起 24 小时后
浏览器最佳实践
上下文选择
直接(无代理):Google Flights、Kayak、Booking.com、Google Hotels、汉莎航空
Scrapling(隐身服务,无需代理):达美航空、英国航空、SecretFlying、
seats.aero、西南航空、希尔顿、凯悦、TripAdvisor、TheFlightDeal、Going、
SeatGuru、Google Travel(航班 + 酒店)
代理(住宅):美联航、美国航空、加拿大航空、荷兰皇家航空、TravelPirates
跳过浏览器(使用 API):
- 酒店搜索 → 先使用 LiteAPI,浏览器仅用于丰富信息
- 优惠发现 → 先使用 Aerobase Deals API,浏览器仅用于验证
- 旅游/活动 → 先使用 Aerobase Tours API,很少需要浏览器
- 航班定价 → Amadeus/Kiwi API,浏览器用于视觉比较
- 奖励搜索 → seats.aero API,浏览器用于特定航空公司查询
Scrapling 服务(反机器人绕过)
当浏览器自动化被反机器人系统(Akamai、Cloudflare、Datadome 等)屏蔽时,
使用通过 SCRAPLING_URL 环境变量配置的隐身 Scrapling 服务。
该服务无需住宅代理即可绕过检测。
参考:Scrapling 文档
何时使用 Scrapling:
- - 网站显示 reCAPTCHA、访问被拒绝或验证页面
- 普通浏览器被屏蔽或重定向
- 需要从 JS 密集型网站提取数据
如何调用:
获取页面(返回包含状态、标题、HTML、验证检测的 JSON):
webfetch {SCRAPLINGURL}/fetch?url=https://www.delta.com&json=1
在页面上运行 JavaScript:
POST {SCRAPLING_URL}/evaluate
Body: {url: https://seats.aero, script: document.title}
检查服务健康状态:
webfetch {SCRAPLINGURL}/health
响应字段:
- - status:HTTP 状态码(200 = 成功)
- title:页面标题
- challenge:pass | captcha | blocked | challenge
- cached:如果从 5 分钟缓存提供则为 true
- html:页面 HTML(JSON 模式下截断至 50KB)
- html_length:完整 HTML 长度
回退链:
- 1. 对于列出的域名,首先尝试 Scrapling 服务
- 如果 challenge != pass:回退到原生浏览器 + 住宅代理
- 如果代理也失败:截图并告知用户
重要提示: Scrapling 响应会缓存 5 分钟。对于时间敏感的
数据(实时价格、座位图),附加 &nocache=1 或等待缓存过期。
聚合器搜索(Scrapling /search)
预构建搜索 + Python 端解析。返回结构化 JSON — 无需浏览器
快照/输入/点击。结果通过 Scrapling 的适配器引擎(CSS 选择器、find_similar 自愈)在服务器端解析。
Google Flights:
POST {SCRAPLING_URL}/search
{site:google-flights,origin:LAX,destination:NRT,departure:2026-03-15,return:2026-03-22}
返回:{results: [{airline:...,price:...,duration:...,stops:...}], count: N}
Kayak:
POST {SCRAPLING_URL}/search
{site:kayak,origin:LAX,destination:NRT,departure:2026-03-15,return:2026-03-22}
Booking.com 酒店:
POST {SCRAPLING_URL}/search
{site:booking,destination:Tokyo,checkin:2026-03-15,checkout:2026-03-22,guests:2}
返回:{results: [{name:...,price:...,rating:...,location:...}], count: N}
优惠网站:
POST {SCRAPLING_URL}/search
{site:secretflying}
POST {SCRAPLING_URL}/search
{site:theflightdeal}
返回:{results: [{title:...,url:...}], count: N}
检查 challenge 字段 — 如果不是 pass,结果可能不完整(同意墙、机器人屏蔽)。
多步骤交互(Scrapling /interact)
对于需要填写表单、点击、截图的流程(值机、登录、注册):
POST {SCRAPLING_URL}/interact
{
url: https://www.southwest.com/air/check-in/,
steps: [
{action: consent},
{action: fill, selector: #confirmationNumber, value: ABC123},
{action: fill, selector: #firstName, value: John},
{action: fill, selector: #lastName, value: Doe},
{action: click, selector: button#form-mixin--submit-button},
{action: wait, ms: 5000},
{action: screenshot},
{action: extract, css: h1::text}
]
}
可用操作:
- - consent — 自动关闭 Cookie 同意墙
- fill — 通过 CSS 选择器填写输入(即时,类似粘贴)
- type — 逐键延迟输入(更人性化,用于敏感字段)
- click — 通过 CSS 选择器点击元素
- wait — 等待 N