mac-use

# Mac Use Control any macOS GUI application through a **screenshot → pick element → click → verify** loop. ## Setup **Platform**: macOS only (requires Apple Vision framework for OCR) **System binaries** (pre-installed on macOS): - `python3` — via Homebrew (`brew install python`) - `screencapture` — built-in macOS utility **Python packages** — install from the skill directory: ```bash pip3 install --break-system-packages -r {baseDir}/requirements.txt ``` ## How It Works The `screenshot` command captures a window, uses **Apple Vision OCR** to detect all text elements, draws numbered annotations on the image, and returns both: 1. **Annotated image** at `/tmp/mac_use.png` — numbered green boxes around each detected text 2. **Element list** in JSON — `[{num: 1, text: "Submit", at: [500, 200]}, {num: 2, text: "Cancel", at: [600, 200]}, ...]` where `at` is the center point `[x, y]` on the 1000x1000 canvas (origin at top-left) You receive both by calling Bash (gets JSON with element list) and then Read on `/tmp/mac_use.png` (gets the visual). **Always do both** so you can cross-reference the numbers with what you see. ## Quick Reference ```bash # List all visible windows python3 {baseDir}/scripts/mac_use.py list # Screenshot + annotate (returns image + numbered element list) python3 {baseDir}/scripts/mac_use.py screenshot <app> [--id N] # Click element by number (primary click method) python3 {baseDir}/scripts/mac_use.py clicknum <N> # Click at canvas coordinates (fallback for unlabeled icons) python3 {baseDir}/scripts/mac_use.py click --app <app> [--id N] <x> <y> # Scroll inside a window python3 {baseDir}/scripts/mac_use.py scroll --app <app> [--id N] <direction> <amount> # Type text (uses clipboard paste — supports all languages) python3 {baseDir}/scripts/mac_use.py type [--app <app>] "text here" # Press key or combo python3 {baseDir}/scripts/mac_use.py key [--app <app>] <combo> ``` ## Workflow 1. **Open** the target app with `open -a "App Name"` (optionally with a URL or file path) 2. **Wait** for it to load: `sleep 2` 3. **Screenshot** the app: ```bash python3 {baseDir}/scripts/mac_use.py screenshot <app> [--id N] ``` This returns JSON with `file` (image path) and `elements` (numbered text list). 4. **Read** the annotated image at `/tmp/mac_use.png` to see the numbered elements visually 5. **Decide** which element to interact with: - **Prefer `clicknum N`** — pick the number of a detected text element - **Fallback `click --app <app> x y`** — only for unlabeled icons (arrows, close buttons, cart icons) that have no text and therefore no number 6. **Act** using `clicknum`, `type`, `key`, or `scroll` 7. **Screenshot again** to verify the result 8. Repeat from step 3 ## Commands ### list Show all visible app windows. ```bash python3 {baseDir}/scripts/mac_use.py list ``` Returns JSON array: `[{"app":"Google Chrome","title":"Wikipedia","id":4527,"x":120,"y":80,"w":1200,"h":800}, ...]` ### screenshot Capture a window, detect text elements via OCR, annotate with numbered markers, and return the element list. The target window is automatically raised to the top before capture, so overlapping windows are handled. ```bash python3 {baseDir}/scripts/mac_use.py screenshot chrome python3 {baseDir}/scripts/mac_use.py screenshot chrome --id 4527 ``` - `<app>`: fuzzy, case-insensitive match (e.g. "chrome" matches "Google Chrome") - `--id N`: target a specific window ID (required when multiple windows of the same app exist) - Returns JSON with: - `file`: path to annotated screenshot (`/tmp/mac_use.png`) - `id`, `app`, `title`, `scale`: window metadata - `elements`: array of `{num, text, at}` — the numbered clickable text elements, where `at` is `[x, y]` center coordinates on the 1000x1000 canvas (origin at top-left) - If multiple windows match, returns a list of windows instead — pick one and retry with `--id` - The image is 1000x1000 pixels with green bounding boxes and blue number badges - Element map is saved to `/tmp/mac_use_elements.json` for `clicknum` ### clicknum Click on a numbered element from the last screenshot. **This is the primary click method.** ```bash python3 {baseDir}/scripts/mac_use.py clicknum 5 python3 {baseDir}/scripts/mac_use.py clicknum 12 ``` - `N`: the element number from the last `screenshot` output - Reads the saved element map, activates the window, and clicks at the element's center - Returns JSON with `clicked_num`, `text`, canvas coords, and absolute screen coords ### click Click at a position using canvas coordinates. **Fallback only — use for unlabeled icons.** ```bash python3 {baseDir}/scripts/mac_use.py click --app chrome 500 300 python3 {baseDir}/scripts/mac_use.py click --app chrome --id 4527 500 300 ``` - **Coordinates are canvas positions (0-1000)** from the screenshot image - x=0 is left, x=1000 is right; y=0 is top, y=1000 is bottom - Use this only when Vision OCR didn't detect the element (icon-only buttons, images, etc.) ### scroll Scroll inside an app window. ```bash python3 {baseDir}/scripts/mac_use.py scroll --app chrome down 5 python3 {baseDir}/scripts/mac_use.py scroll --app notes up 10 ``` - Directions: `up`, `down`, `left`, `right` - Amount: number of scroll clicks (3-5 for moderate, 10+ for fast scrolling) - Mouse is moved to the center of the window before scrolling ### type Type text into the currently focused input field. ```bash python3 {baseDir}/scripts/mac_use.py type --app chrome "hello world" python3 {baseDir}/scripts/mac_use.py type --app chrome "你好世界" ``` - `--app`: activates the app first to ensure keystrokes go to the right window - Uses clipboard paste (Cmd+V) for reliable Unicode/CJK support - **Always click on the target input field first** before typing ### key Press a single key or key combination. ```bash python3 {baseDir}/scripts/mac_use.py key --app chrome return python3 {baseDir}/scripts/mac_use.py key --app chrome cmd+a python3 {baseDir}/scripts/mac_use.py key --app chrome cmd+shift+s ``` - `--app`: activates the app first - Common keys: `return`, `tab`, `escape`, `space`, `delete`, `backspace`, `up`, `down`, `left`, `right` - Modifiers: `cmd`, `ctrl`, `alt`/`opt`, `shift` ## Important Rules - **Always screenshot before your first interaction** with an app - **Always screenshot after an action** to verify the result - **Always Read the screenshot image** after running the screenshot command — you need both the element list AND the visual - **Prefer `clicknum`** over `click` — only use direct coordinates for unlabeled icons - **Click before typing** — ensure the correct input field has focus first - **Multiple windows**: if you get `multiple_windows` error, use `list` to see all windows, then pass `--id` - **Popup windows** (like WeChat mini-program panels) are separate windows with their own IDs — use `list` to find them and `--id` to target them - **Wait after opening apps**: use `sleep 2-3` after `open -a` before taking a screenshot - **Activate the app** before screenshot/click: prepend `osascript -e 'tell application "AppName" to activate' && sleep 1` when the target app may be behind other windows - **Do not type passwords or secrets** via this tool ## Coordinate System (for fallback `click` only) Screenshots are rendered onto a **1000x1000 canvas**: - **Origin (0, 0)** is at the **top-left** corner - **x** increases left to right (0 = left edge, 1000 = right edge) - **y** increases top to bottom (0 = top edge, 1000 = bottom edge) - The app window is scaled to fit (aspect ratio preserved), centered, with dark gray padding ## Example: Order food on Meituan in WeChat ```bash # 1. Open WeChat open -a "WeChat" sleep 3 # 2. Screenshot WeChat — find the mini program window python3 {baseDir}/scripts/mac_use.py list # → find the mini program window ID # 3. Screenshot the mini program (annotated + element list) python3 {baseDir}/scripts/mac_use.py screenshot 微信 --id 41266 # → returns: {"file": "/tmp/mac_use.png", "elements": [{num: 1, text: "搜索", at: [500, 200]}, ...]} # → Read /tmp/mac_use.png to see annotated image # 4. Click "搜索" (element #1) python3 {baseDir}/scripts/mac_use.py clicknum 1 # 5. Type search query python3 {baseDir}/scripts/mac_use.py type --app 微信 "炸鸡" # 6. Press Enter python3 {baseDir}/scripts/mac_use.py key --app 微信 return sleep 2 # 7. Screenshot to see results python3 {baseDir}/scripts/mac_use.py screenshot 微信 --id 41266 # → Read /tmp/mac_use.png, pick a restaurant by number # 8. Click on a restaurant (e.g. element #5) python3 {baseDir}/scripts/mac_use.py clicknum 5 ```

mac-use

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

mac-use

mac-use

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement