返回顶部
🇺🇸 English
🇨🇳 简体中文
🇨🇳 繁體中文
🇺🇸 English
🇯🇵 日本語
🇰🇷 한국어
🇫🇷 Français
🇩🇪 Deutsch
🇪🇸 Español
🇷🇺 Русский
w

windows-control

Full Windows desktop control. Mouse, keyboard, screenshots - interact with any Windows application like a human.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
6,899
下载量
28
收藏
概述
安装方式
版本历史

windows-control

# Windows Control Skill Full desktop automation for Windows. Control mouse, keyboard, and screen like a human user. ## Quick Start All scripts are in `skills/windows-control/scripts/` ### Screenshot ```bash py screenshot.py > output.b64 ``` Returns base64 PNG of entire screen. ### Click ```bash py click.py 500 300 # Left click at (500, 300) py click.py 500 300 right # Right click py click.py 500 300 left 2 # Double click ``` ### Type Text ```bash py type_text.py "Hello World" ``` Types text at current cursor position (10ms between keys). ### Press Keys ```bash py key_press.py "enter" py key_press.py "ctrl+s" py key_press.py "alt+tab" py key_press.py "ctrl+shift+esc" ``` ### Move Mouse ```bash py mouse_move.py 500 300 ``` Moves mouse to coordinates (smooth 0.2s animation). ### Scroll ```bash py scroll.py up 5 # Scroll up 5 notches py scroll.py down 10 # Scroll down 10 notches ``` ### Window Management (NEW!) ```bash py focus_window.py "Chrome" # Bring window to front py minimize_window.py "Notepad" # Minimize window py maximize_window.py "VS Code" # Maximize window py close_window.py "Calculator" # Close window py get_active_window.py # Get title of active window ``` ### Advanced Actions (NEW!) ```bash # Click by text (No coordinates needed!) py click_text.py "Save" # Click "Save" button anywhere py click_text.py "Submit" "Chrome" # Click "Submit" in Chrome only # Drag and Drop py drag.py 100 100 500 300 # Drag from (100,100) to (500,300) # Robust Automation (Wait/Find) py wait_for_text.py "Ready" "App" 30 # Wait up to 30s for text py wait_for_window.py "Notepad" 10 # Wait for window to appear py find_text.py "Login" "Chrome" # Get coordinates of text py list_windows.py # List all open windows ``` ### Read Window Text ```bash py read_window.py "Notepad" # Read all text from Notepad py read_window.py "Visual Studio" # Read text from VS Code py read_window.py "Chrome" # Read text from browser ``` Uses Windows UI Automation to extract actual text (not OCR). Much faster and more accurate than screenshots! ### Read UI Elements (NEW!) ```bash py read_ui_elements.py "Chrome" # All interactive elements py read_ui_elements.py "Chrome" --buttons-only # Just buttons py read_ui_elements.py "Chrome" --links-only # Just links py read_ui_elements.py "Chrome" --json # JSON output ``` Returns buttons, links, tabs, checkboxes, dropdowns with coordinates for clicking. ### Read Webpage Content (NEW!) ```bash py read_webpage.py # Read active browser py read_webpage.py "Chrome" # Target Chrome specifically py read_webpage.py "Chrome" --buttons # Include buttons py read_webpage.py "Chrome" --links # Include links with coords py read_webpage.py "Chrome" --full # All elements (inputs, images) py read_webpage.py "Chrome" --json # JSON output ``` Enhanced browser content extraction with headings, text, buttons, and links. ### Handle Dialogs (NEW!) ```bash # List all open dialogs py handle_dialog.py list # Read current dialog content py handle_dialog.py read py handle_dialog.py read --json # Click button in dialog py handle_dialog.py click "OK" py handle_dialog.py click "Save" py handle_dialog.py click "Yes" # Type into dialog text field py handle_dialog.py type "myfile.txt" py handle_dialog.py type "C:\path\to\file" --field 0 # Dismiss dialog (auto-finds OK/Close/Cancel) py handle_dialog.py dismiss # Wait for dialog to appear py handle_dialog.py wait --timeout 10 py handle_dialog.py wait "Save As" --timeout 5 ``` Handles Save/Open dialogs, message boxes, alerts, confirmations, etc. ### Click Element by Name (NEW!) ```bash py click_element.py "Save" # Click "Save" anywhere py click_element.py "OK" --window "Notepad" # In specific window py click_element.py "Submit" --type Button # Only buttons py click_element.py "File" --type MenuItem # Menu items py click_element.py --list # List clickable elements py click_element.py --list --window "Chrome" # List in specific window ``` Click buttons, links, menu items by name without needing coordinates. ### Read Screen Region (OCR - Optional) ```bash py read_region.py 100 100 500 300 # Read text from coordinates ``` Note: Requires Tesseract OCR installation. Use read_window.py instead for better results. ## Workflow Pattern 1. **Read window** - Extract text from specific window (fast, accurate) 2. **Read UI elements** - Get buttons, links with coordinates 3. **Screenshot** (if needed) - See visual layout 4. **Act** - Click element by name or coordinates 5. **Handle dialogs** - Interact with popups/save dialogs 6. **Read window** - Verify changes ## Screen Coordinates - Origin (0, 0) is top-left corner - Your screen: 2560x1440 (check with screenshot) - Use coordinates from screenshot analysis ## Examples ### Open Notepad and type ```bash # Press Windows key py key_press.py "win" # Type "notepad" py type_text.py "notepad" # Press Enter py key_press.py "enter" # Wait a moment, then type py type_text.py "Hello from AI!" # Save py key_press.py "ctrl+s" ``` ### Click in VS Code ```bash # Read current VS Code content py read_window.py "Visual Studio Code" # Click at specific location (e.g., file explorer) py click.py 50 100 # Type filename py type_text.py "test.js" # Press Enter py key_press.py "enter" # Verify new file opened py read_window.py "Visual Studio Code" ``` ### Monitor Notepad changes ```bash # Read current content py read_window.py "Notepad" # User types something... # Read updated content (no screenshot needed!) py read_window.py "Notepad" ``` ## Text Reading Methods **Method 1: Windows UI Automation (BEST)** - Use `read_window.py` for any window - Use `read_ui_elements.py` for buttons/links with coordinates - Use `read_webpage.py` for browser content with structure - Gets actual text data (not image-based) **Method 2: Click by Name (NEW)** - Use `click_element.py` to click buttons/links by name - No coordinates needed - finds elements automatically - Works across all windows or target specific window **Method 3: Dialog Handling (NEW)** - Use `handle_dialog.py` for popups, save dialogs, alerts - Read dialog content, click buttons, type text - Auto-dismiss with common buttons (OK, Cancel, etc.) **Method 4: Screenshot + Vision (Fallback)** - Take full screenshot - AI reads text visually - Slower but works for any content **Method 5: OCR (Optional)** - Use `read_region.py` with Tesseract - Requires additional installation - Good for images/PDFs with text ## Safety Features - `pyautogui.FAILSAFE = True` (move mouse to top-left to abort) - Small delays between actions - Smooth mouse movements (not instant jumps) ## Requirements - Python 3.11+ - pyautogui (installed ✅) - pillow (installed ✅) ## Tips - Always screenshot first to see current state - Coordinates are absolute (not relative to windows) - Wait briefly after clicks for UI to update - Use `ctrl+z` friendly actions when possible --- **Status:** ✅ READY FOR USE (v2.0 - Dialog & UI Elements) **Created:** 2026-02-01 **Updated:** 2026-02-02

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 windows-control-1776372489 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 windows-control-1776372489 技能

通过命令行安装

skillhub install windows-control-1776372489

下载 Zip 包

⬇ 下载 windows-control v1.0.0

文件大小: 24.68 KB | 发布时间: 2026-4-17 14:34

v1.0.0 最新 2026-4-17 14:34
**Major update: Adds full desktop automation with robust window, UI, and dialog control.**

- NEW: Control mouse, keyboard, screenshots, and interact with any Windows application via scripts.
- NEW: Comprehensive window management (focus, minimize, maximize, close, get active window).
- NEW: Advanced UI automation: click buttons/links by name, read UI elements, robust dialog handling.
- NEW: Read actual window and browser text using Windows UI Automation (not OCR).
- NEW: Extract and interact with webpage content, including buttons, links, and structure.
- Enhanced automation reliability with wait/find routines and smooth mouse movement.
- Safety features: failsafe mouse-abort, small delays, and user-friendly workflow documentation.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部