PyAutoGUI Automation Skill
Cross-platform mouse/keyboard automation for Windows, Linux, and macOS.
Features
- - Mouse control: move, click, drag, scroll
- Keyboard control: key press, hotkeys, type text
- Screen operations: screenshot, mouse position, screen size
- Image utilities: image metadata (size/format/file size), crop images
- Screen overlay: draw temporary markers to validate coordinates
- Draw on images: draw persistent markers into an image and save
- Image locating: template matching and OCR-based text locating
- Cleanup: remove generated screenshots/marked files to free disk space
Activation
Activate when the user asks to do things like:
- - "Click a position on the screen"
- "Move the mouse to (x, y)"
- "Type text / press keys"
- "Take a screenshot"
- "Run repetitive UI automation"
- "Get the current mouse position"
- "Get image size / image info"
- "Crop an image"
- "Draw a marker on the screen"
- "Draw a marker on an image"
- "Locate an element by template"
- "Locate text on the screen (OCR)"
- "Clean up screenshots / temporary files"
Usage
Install dependencies
CODEBLOCK0
Screen info
CODEBLOCK1
Mouse actions
CODEBLOCK2
Keyboard actions
CODEBLOCK3
Screenshot
CODEBLOCK4
Screenshot notes:
- - Supported formats: PNG (recommended), JPG, BMP, etc.
- Scope: primary monitor (in multi-monitor setups)
Region Screenshot
CODEBLOCK5
Parameters:
- -
x1, y1: Top-left corner coordinates - INLINECODE1 : Bottom-right corner coordinates
- Order doesn't matter (automatically calculated)
Copy & Paste
CODEBLOCK6
Use cases:
- -
copy_paste is faster than type_text for long text - Use
copy_paste when you want to skip typing animation - Use
type_text when you need to simulate realistic typing
Common key names
- - Letters:
a b c ... - Numbers:
0 1 2 ... - Function keys:
f1 f2 ... INLINECODE14 - Modifiers:
ctrl alt shift INLINECODE18 - Others:
enter esc tab space backspace delete up down left INLINECODE28
Safety
⚠️ Important:
- 1. Make sure the target window is focused before executing actions
- Be careful with system hotkeys to avoid unintended actions
- Add delays when needed to give yourself time to interrupt
- Moving the mouse to the top-left corner (0, 0) triggers PyAutoGUI failsafe
Cross-platform notes
- - Windows: Full support; admin permission may be needed in some environments
- Linux: Requires X11; Wayland may not work
- macOS: Grant Accessibility permission to Terminal/Python in System Settings
Example scenarios
Open Calculator (Windows)
CODEBLOCK7
Auto-fill a form
CODEBLOCK8
Batch clicking
CODEBLOCK9
Included scripts
- -
scripts/keyboard_mouse.py - Mouse/keyboard control - INLINECODE30 - Image utilities
- INLINECODE31 - Screen overlay markers
- INLINECODE32 - Draw markers on images
- INLINECODE33 - Image locating (template + OCR)
- INLINECODE34 - Cleanup tool
Image utilities
Image info
CODEBLOCK10
Crop image
CODEBLOCK11
Output example
CODEBLOCK12
Image fields
| Field | Meaning | Example |
|---|
| INLINECODE35 | Image width (px) | 1920, 3840 |
| INLINECODE36 |
Image height (px) | 1080, 2160 |
|
format | Image format | PNG, JPEG, GIF, BMP, WEBP |
|
mode | Color mode | RGB, RGBA, L |
|
file_size_bytes | File size (bytes) | 2097152 |
|
file_size_kb | File size (KB) | 2048.0 |
Coordinate system
Screen coordinates:
- - Origin (0, 0) is the top-left corner
- X increases to the right
- Y increases downward
Crop coordinates:
- -
x1, y1: top-left corner of crop - INLINECODE42 : bottom-right corner of crop
- Cropped size = (x2 - x1) × (y2 - y1)
Example:
CODEBLOCK13
Typical workflows
Analyze positions in a screenshot
CODEBLOCK14
Batch image sizing
CODEBLOCK15
Capture a region of the screen
python3 scripts/keyboard_mouse.py screenshot full.png
python3 scripts/image_utils.py crop full.png 500 300 1000 800 -o region.png
Screen overlay markers
Draw temporary markers on the screen for coordinate verification. Useful for:
- - Calibrating coordinates
- Confirming the real position of a button/element
- Debugging automation scripts
Draw a marker
CODEBLOCK17
Draw a rectangular area
CODEBLOCK18
Marker types
| Type | Description | Use case |
|---|
| INLINECODE43 | Crosshair | Precise single-point targeting |
| INLINECODE44 |
Circle | Mark buttons/circular elements |
|
square | Square | Mark rectangular elements |
|
arrow | Arrow | Indicate direction / draw attention |
|
target | Target | Strongest visual cue (circle + crosshair) |
Colors
INLINECODE48 , green, blue, yellow, cyan, magenta, white, INLINECODE55
Coordinate calibration example
CODEBLOCK19
Draw markers on images
Draw persistent markers into image files. Useful for:
- - Annotating recognized positions on a screenshot
- Producing reference images
- Batch marking candidates for comparison
- Keeping calibration records
Draw a marker
CODEBLOCK20
Draw a rectangular area
CODEBLOCK21
Batch marking workflow
CODEBLOCK22
Screen overlay vs drawing on image
| Item | Screen overlay (drawoverlay.py) | Draw on image (drawon_image.py) |
|---|
| Display | Real-time on screen | Inside the image file |
| Duration |
Temporary | Persistent |
| Interaction | Auto-close (time) | No interaction |
| Best for | Real-time coordinate validation | Generating annotated references |
| Output | Not saved | Saved to file |
Recommended coordinate calibration (cost-saving)
CODEBLOCK23
Image locating
Built on OpenCV template matching and RapidOCR. Supports locating UI elements by image and by text.
Install dependencies
CODEBLOCK24
Note: RapidOCR models are ~15MB and are downloaded automatically on first use.
Template matching (find by image)
CODEBLOCK25
Output example:
CODEBLOCK26
OCR text locating (find by text)
CODEBLOCK27
Output example:
CODEBLOCK28
Recommended automation workflows
Template matching (most accurate):
CODEBLOCK29
OCR text locating (when no template is available):
CODEBLOCK30
Important principle:
- 1. OCR returns accurate screen coordinates; do not modify the returned coordinates
- If there are multiple candidates, mark them on an image to visually choose the correct one
- Once you choose the right candidate, click using the original coordinates
Template matching vs OCR
| Item | Template matching | OCR text locating |
|---|
| Accuracy | ⭐⭐⭐⭐⭐ pixel-level | ⭐⭐⭐⭐ depends on font/background |
| Speed |
⭐⭐⭐⭐⭐ milliseconds | ⭐⭐⭐ requires inference |
| Dependencies | OpenCV | RapidOCR |
| Best for | Icons/buttons/fixed UI | Text buttons/labels/inputs |
Why this is better than guessing coordinates
- 1. High precision and repeatability (pixel-level)
- Local compute with no API cost
- Fast response
- Easy to debug via marked outputs
Cleanup
Analyze disk usage
CODEBLOCK31
Clean files
CODEBLOCK32
Auto cleanup
CODEBLOCK33
End-to-end example
CODEBLOCK34
Command quick reference
Mouse/keyboard (keyboard_mouse.py)
| Command | Description | Example |
|---|
| INLINECODE57 | Get screen size | INLINECODE58 |
| INLINECODE59 |
Get mouse position |
keyboard_mouse.py mouse_position |
|
mouse_move x y | Move mouse |
keyboard_mouse.py mouse_move 500 300 |
|
mouse_click button | Click mouse |
keyboard_mouse.py mouse_click left |
|
mouse_click_at x y button | Click at coordinates |
keyboard_mouse.py mouse_click_at 500 300 left |
|
mouse_double_click x y | Double click |
keyboard_mouse.py mouse_double_click 500 300 |
|
mouse_drag x1 y1 x2 y2 | Drag |
keyboard_mouse.py mouse_drag 500 300 800 600 |
|
mouse_scroll amount | Scroll |
keyboard_mouse.py mouse_scroll 5 |
|
key_press key | Press key |
keyboard_mouse.py key_press enter |
|
key_hotkey key1 key2 | Hotkey |
keyboard_mouse.py key_hotkey ctrl c |
|
type_text text | Type text |
keyboard_mouse.py type_text "Hello" |
|
screenshot path | Screenshot |
keyboard_mouse.py screenshot img.png |
Image utilities (image_utils.py)
| Command | Description | Example |
|---|
| INLINECODE82 | Full image info | INLINECODE83 |
| INLINECODE84 |
Image size only |
image_utils.py size photo.jpg |
|
crop x1 y1 x2 y2 | Crop image |
image_utils.py crop img.png 100 100 500 500 |
Screen overlay (draw_overlay.py)
| Command | Description | Example |
|---|
| INLINECODE89 | Draw marker | INLINECODE90 |
| INLINECODE91 |
Draw rectangle |
draw_overlay.py area 100 100 500 400 |
Draw on image (draw_on_image.py)
| Command | Description | Example |
|---|
| INLINECODE94 | Draw marker on image | INLINECODE95 |
| INLINECODE96 |
Draw rectangle on image |
draw_on_image.py img.png area 100 100 500 400 |
Image finder (image_finder.py)
| Command | Description | Example |
|---|
| INLINECODE99 | Find by template | INLINECODE100 |
| INLINECODE101 |
Find by text (OCR) |
image_finder.py text "Send" |
|
text-all | Recognize all text |
image_finder.py text-all |
Cleanup (cleanup.py)
| Command | Description | Example |
|---|
| INLINECODE106 | Analyze disk usage | INLINECODE107 |
| INLINECODE108 |
Clean files |
cleanup.py clean . --days 7 --execute |
|
auto dir | Auto cleanup |
cleanup.py auto . --max-files 50 |
PyAutoGUI 自动化技能
适用于 Windows、Linux 和 macOS 的跨平台鼠标/键盘自动化工具。
功能特性
- - 鼠标控制:移动、点击、拖拽、滚动
- 键盘控制:按键、快捷键、输入文本
- 屏幕操作:截图、鼠标位置、屏幕尺寸
- 图像工具:图像元数据(尺寸/格式/文件大小)、裁剪图像
- 屏幕覆盖层:绘制临时标记以验证坐标
- 图像绘制:在图像上绘制持久标记并保存
- 图像定位:模板匹配和基于 OCR 的文本定位
- 清理:删除生成的截图/标记文件以释放磁盘空间
激活条件
当用户提出以下需求时激活:
- - 点击屏幕上的某个位置
- 将鼠标移动到 (x, y)
- 输入文本 / 按下按键
- 截取屏幕截图
- 运行重复性 UI 自动化
- 获取当前鼠标位置
- 获取图像尺寸 / 图像信息
- 裁剪图像
- 在屏幕上绘制标记
- 在图像上绘制标记
- 通过模板定位元素
- 定位屏幕上的文本 (OCR)
- 清理截图 / 临时文件
使用方法
安装依赖
bash
鼠标/键盘自动化
pip3 install pyautogui
图像工具
pip3 install Pillow
屏幕信息
bash
屏幕尺寸
python3 scripts/keyboard
mouse.py screensize
鼠标位置
python3 scripts/keyboard
mouse.py mouseposition
鼠标操作
bash
移动鼠标到 (x, y)
python3 scripts/keyboard
mouse.py mousemove 500 300
python3 scripts/keyboard
mouse.py mousemove 500 300 --duration 1.0
鼠标点击(左键/右键/中键)
python3 scripts/keyboard
mouse.py mouseclick left
python3 scripts/keyboard
mouse.py mouseclick right
python3 scripts/keyboard
mouse.py mouseclick middle --clicks 2
在指定位置点击
python3 scripts/keyboard
mouse.py mouseclick_at 500 300 left
python3 scripts/keyboard
mouse.py mouseclick_at 500 300 right --clicks 2
双击
python3 scripts/keyboard
mouse.py mousedouble_click 500 300
拖拽
python3 scripts/keyboard
mouse.py mousedrag 500 300 800 600
python3 scripts/keyboard
mouse.py mousedrag 500 300 800 600 --duration 2.0
滚动(正数 = 向上,负数 = 向下)
python3 scripts/keyboard
mouse.py mousescroll 5
python3 scripts/keyboard
mouse.py mousescroll -3
键盘操作
bash
单个按键
python3 scripts/keyboard
mouse.py keypress enter
python3 scripts/keyboard
mouse.py keypress escape
python3 scripts/keyboard
mouse.py keypress tab
python3 scripts/keyboard
mouse.py keypress space
快捷键
python3 scripts/keyboard
mouse.py keyhotkey ctrl c
python3 scripts/keyboard
mouse.py keyhotkey ctrl v
python3 scripts/keyboard
mouse.py keyhotkey win r
python3 scripts/keyboard
mouse.py keyhotkey alt tab
python3 scripts/keyboard
mouse.py keyhotkey ctrl alt t
输入文本
python3 scripts/keyboard
mouse.py typetext Hello World
python3 scripts/keyboard
mouse.py typetext 你好世界 --interval 0.05
截图
bash
保存截图(主屏幕)
python3 scripts/keyboard_mouse.py screenshot /tmp/screenshot.png
Windows 示例
python scripts/keyboard_mouse.py screenshot E:\\temp\\screenshot.png
截图说明:
- - 支持的格式:PNG(推荐)、JPG、BMP 等
- 范围:主显示器(在多显示器设置中)
区域截图
bash
截取指定区域 (x1, y1, x2, y2)
python3 scripts/keyboard
mouse.py screenshotregion region.png 100 100 500 500
Windows 示例 - 截取 QQ 聊天窗口区域
python scripts/keyboard
mouse.py screenshotregion qq_window.png 2800 300 3800 1200
参数说明:
- - x1, y1:左上角坐标
- x2, y2:右下角坐标
- 顺序无关(自动计算)
复制与粘贴
bash
复制文本到剪贴板
python3 scripts/keyboard_mouse.py copy 要复制的文本
从剪贴板粘贴(Ctrl+V)
python3 scripts/keyboard_mouse.py paste
一条命令完成复制粘贴(输入文本最快方式)
python3 scripts/keyboard
mouse.py copypaste 直接输入的文本
使用场景:
- - 对于长文本,copypaste 比 typetext 更快
- 当需要跳过打字动画时使用 copypaste
- 当需要模拟真实打字效果时使用 typetext
常用按键名称
- - 字母:a b c ...
- 数字:0 1 2 ...
- 功能键:f1 f2 ... f12
- 修饰键:ctrl alt shift win
- 其他:enter esc tab space backspace delete up down left right
安全提示
⚠️ 重要提示:
- 1. 执行操作前确保目标窗口处于焦点状态
- 谨慎使用系统快捷键,避免意外操作
- 必要时添加延迟,给自己留出中断时间
- 将鼠标移动到左上角 (0, 0) 会触发 PyAutoGUI 安全保护
跨平台说明
- - Windows:完全支持;某些环境可能需要管理员权限
- Linux:需要 X11;Wayland 可能无法使用
- macOS:在系统设置中为终端/Python 授予辅助功能权限
示例场景
打开计算器(Windows)
bash
python3 scripts/keyboard
mouse.py keyhotkey win r
python3 scripts/keyboard
mouse.py typetext calc
python3 scripts/keyboard
mouse.py keypress enter
自动填写表单
bash
python3 scripts/keyboard
mouse.py mouseclick_at 500 300 left
python3 scripts/keyboard
mouse.py typetext example@email.com
python3 scripts/keyboard
mouse.py keypress tab
python3 scripts/keyboard
mouse.py typetext password123
批量点击
bash
python3 scripts/keyboard
mouse.py mouseclick_at 100 100 left
python3 scripts/keyboard
mouse.py mouseclick_at 200 200 left
python3 scripts/keyboard
mouse.py mouseclick_at 300 300 left
包含的脚本
- - scripts/keyboardmouse.py - 鼠标/键盘控制
- scripts/imageutils.py - 图像工具
- scripts/drawoverlay.py - 屏幕覆盖层标记
- scripts/drawonimage.py - 在图像上绘制标记
- scripts/imagefinder.py - 图像定位(模板 + OCR)
- scripts/cleanup.py - 清理工具
图像工具
图像信息
bash
python3 scripts/image_utils.py info screenshot.png
python3 scripts/image_utils.py size photo.jpg
裁剪图像
bash
python3 scripts/image_utils.py crop screenshot.png 100 100 500 500
python3 scripts/image_utils.py crop screenshot.png 100 100 500 500 -o output.png
输出示例
bash
$ python3 scripts/image_utils.py info screenshot.png
{
path: screenshot.png,
filename: screenshot.png,
size: {
width: 3840,
height: 2160
},
format: PNG,
mode: RGB,
filesizebytes: 2097152,
filesizekb: 2048.0
}
图像字段说明
| 字段 | 含义 | 示例 |
|---|
| width | 图像宽度(像素) | 1920, 3840 |
| height |
图像高度(像素) | 1080, 2160 |
| format | 图像格式 | PNG, JPEG, GIF, BMP, WEBP |
| mode | 色彩模式 | RGB, RGBA, L |
| file
sizebytes | 文件大小(字节) | 209715