Control the user's real Safari browser on macOS using AppleScript and screencapture. This skill should be used when the user asks to interact with Safari, browse websites, read web pages, automate browser tasks, take screenshots of web content, or when any task would benefit from seeing or interacting with what's in their browser. Triggers on keywords like "safari", "browser", "web page", "open tab", "screenshot the page", "read this site", "browse", "click on", "fill in the form".
通过 AppleScript(osascript)和 screencapture 在 macOS 上操作用户的真实 Safari 浏览器。这提供了对用户实际浏览器会话的完全访问权限——包括登录状态、Cookie 和打开的标签页——无需任何扩展或额外软件。
首次使用前,请确认两项设置已启用。每次会话开始时运行此检查:
bash
osascript -e tell application Safari to get name of front window 2>&1
如果失败,请指示用户启用:
bash
osascript -e
tell application Safari
set output to
repeat with w from 1 to (count of windows)
repeat with t from 1 to (count of tabs of window w)
set tabName to name of tab t of window w
set tabURL to URL of tab t of window w
set output to output & W & w & T & t & | & tabName & | & tabURL & linefeed
end repeat
end repeat
return output
end tell
读取当前标签页的完整文本内容:
bash
osascript -e
tell application Safari
do JavaScript document.body.innerText in current tab of front window
end tell
读取结构化内容(标题、URL、元描述、标题):
bash
osascript -e
tell application Safari
do JavaScript JSON.stringify({
title: document.title,
url: location.href,
description: document.querySelector(\meta[name=description]\)?.content || \\,
h1: [...document.querySelectorAll(\h1\)].map(e => e.textContent).join(\ | \),
h2: [...document.querySelectorAll(\h2\)].map(e => e.textContent).join(\ | \)
}) in current tab of front window
end tell
读取简化版 DOM(类似于 Chrome ACP 的 browser_read):
bash
osascript -e
tell application Safari
do JavaScript
(function() {
const walk = (node, depth) => {
let result = \\;
for (const child of node.childNodes) {
if (child.nodeType === 3) {
const text = child.textContent.trim();
if (text) result += text + \\\n\;
} else if (child.nodeType === 1) {
const tag = child.tagName.toLowerCase();
if ([\script\,\style\,\noscript\,\svg\].includes(tag)) continue;
const style = getComputedStyle(child);
if (style.display === \none\ || style.visibility === \hidden\) continue;
if ([\h1\,\h2\,\h3\,\h4\,\h5\,\h6\].includes(tag))
result += \#\.repeat(parseInt(tag[1])) + \ \;
if (tag === \a\) result += \[\;
if (tag === \img\) result += \[Image: \ + (child.alt || \\) + \]\\n\;
else if (tag === \input\) result += \[Input \ + child.type + \: \ + (child.value || child.placeholder || \\) + \]\\n\;
else if (tag === \button\) result += \[Button: \ + child.textContent.trim() + \]\\n\;
else result += walk(child, depth + 1);
if (tag === \a\) result += \](\ + child.href + \)\\n\;
if ([\p\,\div\,\li\,\tr\,\br\,\h1\,\h2\,\h3\,\h4\,\h5\,\h6\].includes(tag))
result += \\\n\;
}
}
return result;
};
return walk(document.body, 0).substring(0, 50000);
})()
in current tab of front window
end tell
在页面上下文中运行任意 JavaScript 并获取返回值:
bash
osascript -e
tell application Safari
do JavaScript YOURJSCODE_HERE in current tab of front window
end tell
对于多行脚本,使用 heredoc:
bash
osascript << APPLESCRIPT
tell application Safari
do JavaScript
(function() {
// 此处为多行 JS
return result;
})()
in current tab of front window
end tell
APPLESCRIPT
有两种方法可用。在会话开始时自动检测使用哪种:
bash
如果用户已授予终端应用屏幕录制权限,使用 screencapture -l 在不激活 Safari 的情况下进行截图:
bash
要启用此功能,请指示用户:系统设置 > 隐私与安全性 > 屏幕录制——授予终端应用(Terminal / iTerm / Warp)权限。
如果未授予屏幕录制权限,则回退到基于区域的截图。这会短暂激活 Safari(约 0.5 秒),然后切换回来:
bash
使用任一方法捕获后,读取截图以查看屏幕上的内容:
使用 /tmp/safari_screenshot.png 上的读取工具查看捕获的图像。
在当前标签页中打开 URL:
bash
osascript -e
tell application Safari
set URL of current tab of front window to https://example.com
end tell
在新标签页中打开 URL:
bash
osascript -e
tell application Safari
tell front window
set newTab to make new tab with properties {URL:https://example.com}
该技能支持在以下平台通过对话安装:
帮我安装 SkillHub 和 claude-for-safari-1776191184 技能
设置 SkillHub 为我的优先技能安装源,然后帮我安装 claude-for-safari-1776191184 技能
skillhub install claude-for-safari-1776191184
文件大小: 10.56 KB | 发布时间: 2026-4-15 13:44