Safari Browser Control
Operate the user's real Safari browser on macOS via AppleScript (osascript) and screencapture. This provides full access to the user's actual browser session — including login state, cookies, and open tabs — without any extensions or additional software.
Unlike Playwright or headless browsers, this skill controls your real Safari — same cookies, same logins, same tabs. Zero install, pure macOS native.
Prerequisites
Before first use, verify two settings are enabled. Run this check at the start of every session:
CODEBLOCK0
If this fails, instruct the user to enable:
- 1. System Settings > Privacy & Security > Automation — grant terminal app permission to control Safari
- Safari > Settings > Advanced — enable "Show features for web developers", then Develop menu > Allow JavaScript from Apple Events
Core Capabilities
1. List All Open Tabs
CODEBLOCK1
2. Read Page Content
Read the full text content of the current tab:
CODEBLOCK2
Read structured content (title, URL, meta description, headings):
CODEBLOCK3
Read a simplified DOM (similar to Chrome ACP's browser_read):
CODEBLOCK4
3. Execute JavaScript
Run arbitrary JavaScript in the page context and get the return value:
CODEBLOCK5
For multi-line scripts, use a heredoc:
CODEBLOCK6
4. Screenshot
Two approaches are available. Auto-detect which to use at session start:
CODEBLOCK7
Background Screenshot (requires Screen Recording permission)
If the user has granted Screen Recording permission to the terminal app, use screencapture -l to capture Safari without activating it:
CODEBLOCK8
To enable this, instruct the user: System Settings > Privacy & Security > Screen Recording — grant permission to the terminal app (Terminal / iTerm / Warp).
Foreground Screenshot (no extra permissions needed)
If Screen Recording is not granted, fall back to region-based capture. This briefly activates Safari (~0.5s), then switches back:
CODEBLOCK9
After capturing with either method, read the screenshot to see what's on screen:
CODEBLOCK10
5. Navigate
Open a URL in the current tab:
CODEBLOCK11
Open a URL in a new tab:
CODEBLOCK12
Open a URL in a new window:
CODEBLOCK13
6. Click Elements
Click using JavaScript (preferred — works with SPAs and reactive frameworks):
CODEBLOCK14
Important: Use dispatchEvent(new MouseEvent(..., {bubbles: true})) instead of .click() for React/Vue/Angular compatibility. Native .click() may bypass synthetic event handlers.
7. Type and Fill Forms
Set input values via JavaScript:
CODEBLOCK15
Important: For React-controlled inputs, use the native setter + dispatchEvent pattern shown above. Directly setting .value will not trigger React's state update.
Type via System Events (simulates real keyboard — useful when JS injection is blocked):
CODEBLOCK16
Press special keys:
CODEBLOCK17
8. Scroll
CODEBLOCK18
9. Switch Tabs
CODEBLOCK19
10. Wait for Page Load
CODEBLOCK20
Workflow: Browsing with Screenshot Feedback Loop
For tasks that require visual confirmation, use the screenshot loop:
- 1. Perform action (navigate, click, scroll, etc.)
- Wait for page load if needed
- Take screenshot (background or foreground) → Read the image to see result
- Decide next action based on what is visible
Operating on Specific Tabs
To operate on a tab other than the current one, use tab N of window M syntax:
CODEBLOCK21
Note: Background screenshots capture the entire Safari window (whichever tab is active). To screenshot a specific tab, first switch to it via AppleScript.
Limitations
- - macOS only — AppleScript and screencapture are macOS-specific
- Cannot intercept network requests — only page content and JS execution
- Cannot access cross-origin iframes — browser security applies
- Private browsing windows — AppleScript cannot control private windows
- System Events keystroke is "blind" — it types into whatever is focused; ensure Safari is frontmost before using
Safari 浏览器控制
通过 AppleScript(osascript)和 screencapture 在 macOS 上操作用户真实的 Safari 浏览器。这提供了对用户实际浏览器会话的完全访问权限——包括登录状态、Cookie 和打开的标签页——无需任何扩展或额外软件。
与 Playwright 或无头浏览器不同,此技能控制的是你真实的 Safari——相同的 Cookie、相同的登录状态、相同的标签页。零安装,纯 macOS 原生。
前提条件
首次使用前,请确认已启用两项设置。每次会话开始时运行此检查:
bash
osascript -e tell application Safari to get name of front window 2>&1
如果失败,请指示用户启用:
- 1. 系统设置 > 隐私与安全性 > 自动化——授予终端应用控制 Safari 的权限
- Safari > 设置 > 高级——启用为网页开发者显示功能,然后开发菜单 > 允许 Apple 事件中的 JavaScript
核心功能
1. 列出所有打开的标签页
bash
osascript -e
tell application Safari
set output to
repeat with w from 1 to (count of windows)
repeat with t from 1 to (count of tabs of window w)
set tabName to name of tab t of window w
set tabURL to URL of tab t of window w
set output to output & W & w & T & t & | & tabName & | & tabURL & linefeed
end repeat
end repeat
return output
end tell
2. 读取页面内容
读取当前标签页的完整文本内容:
bash
osascript -e
tell application Safari
do JavaScript document.body.innerText in current tab of front window
end tell
读取结构化内容(标题、URL、元描述、标题):
bash
osascript -e
tell application Safari
do JavaScript JSON.stringify({
title: document.title,
url: location.href,
description: document.querySelector(\meta[name=description]\)?.content || \\,
h1: [...document.querySelectorAll(\h1\)].map(e => e.textContent).join(\ | \),
h2: [...document.querySelectorAll(\h2\)].map(e => e.textContent).join(\ | \)
}) in current tab of front window
end tell
读取简化版 DOM(类似于 Chrome ACP 的 browser_read):
bash
osascript -e
tell application Safari
do JavaScript
(function() {
const walk = (node, depth) => {
let result = \\;
for (const child of node.childNodes) {
if (child.nodeType === 3) {
const text = child.textContent.trim();
if (text) result += text + \\\n\;
} else if (child.nodeType === 1) {
const tag = child.tagName.toLowerCase();
if ([\script\,\style\,\noscript\,\svg\].includes(tag)) continue;
const style = getComputedStyle(child);
if (style.display === \none\ || style.visibility === \hidden\) continue;
if ([\h1\,\h2\,\h3\,\h4\,\h5\,\h6\].includes(tag))
result += \#\.repeat(parseInt(tag[1])) + \ \;
if (tag === \a\) result += \[\;
if (tag === \img\) result += \[Image: \ + (child.alt || \\) + \]\\n\;
else if (tag === \input\) result += \[Input \ + child.type + \: \ + (child.value || child.placeholder || \\) + \]\\n\;
else if (tag === \button\) result += \[Button: \ + child.textContent.trim() + \]\\n\;
else result += walk(child, depth + 1);
if (tag === \a\) result += \](\ + child.href + \)\\n\;
if ([\p\,\div\,\li\,\tr\,\br\,\h1\,\h2\,\h3\,\h4\,\h5\,\h6\].includes(tag))
result += \\\n\;
}
}
return result;
};
return walk(document.body, 0).substring(0, 50000);
})()
in current tab of front window
end tell
3. 执行 JavaScript
在页面上下文中运行任意 JavaScript 并获取返回值:
bash
osascript -e
tell application Safari
do JavaScript YOURJSCODE_HERE in current tab of front window
end tell
对于多行脚本,使用 heredoc:
bash
osascript << APPLESCRIPT
tell application Safari
do JavaScript
(function() {
// 此处为多行 JS
return result;
})()
in current tab of front window
end tell
APPLESCRIPT
4. 截图
提供两种方法。在会话开始时自动检测使用哪种:
bash
测试是否已授予屏幕录制权限(后台截图可用)
/tmp/safari
wid 2>/dev/null && echo BACKGROUNDSCREENSHOT=true || echo BACKGROUND_SCREENSHOT=false
后台截图(需要屏幕录制权限)
如果用户已授予终端应用屏幕录制权限,使用 screencapture -l 在不激活 Safari 的情况下进行截图:
bash
每次会话编译一次辅助工具(如果尚未编译)
if [ ! -f /tmp/safari_wid ]; then
cat > /tmp/safari_wid.swift << SWIFT
import CoreGraphics
import Foundation
let options: CGWindowListOption = [.optionOnScreenOnly, .excludeDesktopElements]
guard let windowList = CGWindowListCopyWindowInfo(options, kCGNullWindowID) as? [[String: Any]] else { exit(1) }
for window in windowList {
guard let owner = window[kCGWindowOwnerName as String] as? String,
owner == Safari,
let layer = window[kCGWindowLayer as String] as? Int,
layer == 0,
let wid = window[kCGWindowNumber as String] as? Int else { continue }
print(wid)
exit(0)
}
exit(1)
SWIFT
swiftc /tmp/safari
wid.swift -o /tmp/safariwid
fi
后台捕获 Safari 窗口(无需激活)
WID=$(/tmp/safari_wid)
screencapture -l $WID -o -x /tmp/safari_screenshot.png
要启用此功能,指示用户:系统设置 > 隐私与安全性 > 屏幕录制——授予终端应用(Terminal / iTerm / Warp)权限。
前台截图(无需额外权限)
如果未授予屏幕录制权限,回退到基于区域的截图。这会短暂激活 Safari(约 0.5 秒),然后切换回来:
bash
记住当前最前端的应用
FRONT_APP=$(osascript -e tell application System Events to get name of first process whose frontmost is true)
激活 Safari 并捕获其窗口区域
osascript -e tell application Safari to activate
sleep 0.3
BOUNDS=$(osascript -e
tell application System Events
tell process Safari
-- Safari 可能将薄工具栏暴露为窗口 1;找到最大的窗口
set bestW to 0
set bestBounds to
repeat with i from 1 to (count of windows)
set {x, y} to position of window i
set {w, h} to size of window i
if w * h > bestW then
set bestW to w * h
set bestBounds to (x as text) & , & (y as text) & , & (w as text) & , & (h as text)
end if
end repeat
return bestBounds
end tell
end tell)
screencapture -x -R $BOUNDS /tmp/safari_screenshot.png
切换回之前的应用
osascript -e tell application \$FRONT_APP\ to activate
使用任一方法捕获后,读取截图以查看屏幕上的内容:
使用 /tmp/safari_screenshot.png 上的读取工具查看捕获的图像。
5. 导航
在当前标签页中打开 URL:
bash
osascript -e
tell application Safari
set URL of current tab of front window to https://example.com
end tell
在新标签页中打开 URL:
bash