Desktop Control (Linux)
Safe desktop automation for Linux using PyAutoGUI with explicit approvals and environment checks.
Requirements
- - Linux with GUI session (X11 recommended)
- Python packages:
-
pyautogui
-
pillow
-
pygetwindow (window ops; not supported on Linux)
-
pyperclip (clipboard ops)
-
opencv-python (optional, image match)
System packages (common):
- -
python3-tk, scrot, xclip or INLINECODE8 - INLINECODE9 (window list/activate)
- INLINECODE10 (active window)
Quick Start
CODEBLOCK0
Screenshot to file
CODEBLOCK1
Record screen (ffmpeg)
CODEBLOCK2
Launch Chrome + open URL (default wait 15s; use 15–30s for heavy apps)
CODEBLOCK3
Preset examples
CODEBLOCK4
Workflow (DSL) example
CODEBLOCK5
OCR & State Detection example
CODEBLOCK6
Multi-monitor example
CODEBLOCK7
Multi-browser example
CODEBLOCK8
Window Manager example
CODEBLOCK9
Flow Recorder example
CODEBLOCK10
AI Vision & Smart Wait example
CODEBLOCK11
Drag & Drop example
CODEBLOCK12
Robust retry example
CODEBLOCK13
API
Same interface as DesktopController:
- - mouse:
move_mouse, click, drag, scroll, INLINECODE16 - keyboard:
type_text, press, hotkey, wait, launch_app, open_url, open_chrome, wait_retry_window, wait_retry_new_window, INLINECODE26 - screen/ui:
click_image, click_image_or, INLINECODE29 - state:
ensure_window, active_window_contains, wait_for_text, INLINECODE33 - recovery:
recover_reload, recover_back, INLINECODE36 - workflows: INLINECODE37
- presets:
register_preset, INLINECODE39 - ocr: INLINECODE40
- multi-monitor:
get_monitors, INLINECODE42 - robust:
robust_click, INLINECODE44 - smart-wait:
smart_wait, INLINECODE46 - drag-drop:
drag_drop, INLINECODE48 - window-manager:
resize_window, minimize_window, INLINECODE51 - multi-browser:
open_firefox, INLINECODE53 - keyboard: INLINECODE54
- ai-vision:
find_element_by_color, INLINECODE56 - recorder:
start_recording, record_action, stop_recording, INLINECODE60
INLINECODE61
- - If
window_title is provided: waits 15s, retries once, then errors if not found. - If
auto_detect_window=True: detects a new window title automatically, waits 15s, retries once.
INLINECODE64
- - Runs action → wait → check → retry (with wait) to avoid rapid loops.
- screen:
screenshot, screenshot_to, record_screen, get_pixel_color, INLINECODE69 - windows:
get_all_windows, activate_window, focus_window_or_click, INLINECODE73 - clipboard:
copy_to_clipboard, INLINECODE75
Safety
- - Approval mode enabled by default
- Failsafe: move mouse to any corner to abort
- Environment guard: warns on Wayland or headless sessions
- Auto-detect DISPLAY: tries
/tmp/.X11-unix when DISPLAY is missing
桌面控制 (Linux)
使用PyAutoGUI在Linux上实现安全的桌面自动化,包含明确的审批和环境检查。
系统要求
- - 带有图形会话的Linux(推荐X11)
- Python包:
- pyautogui
- pillow
- pygetwindow(窗口操作;Linux不支持)
- pyperclip(剪贴板操作)
- opencv-python(可选,图像匹配)
系统包(通用):
- - python3-tk、scrot、xclip或xsel
- wmctrl(窗口列表/激活)
- xdotool(活动窗口)
快速开始
bash
python - <
from skills.desktopcontrollinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=True)
print(dc.getscreensize())
PY
截图保存到文件
bash
python - <
controllinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
print(dc.screenshot_to(/tmp/screen.png))
PY
录制屏幕(ffmpeg)
bash
python - <controllinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
print(dc.record_screen(/tmp/record.mp4, seconds=30))
PY
启动Chrome并打开URL(默认等待15秒;重型应用使用15-30秒)
bash
python - <controllinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
dc.openchrome(http://localhost:8000, waitseconds=15)
PY
预设示例
bash
python - <controllinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
def presetopensite():
dc.openchrome(http://localhost:8000, waitseconds=15)
def presetloginsite():
dc.openchrome(http://localhost:8000/login, waitseconds=15)
dc.loginform(user@example.com, password, waitseconds=10)
dc.registerpreset(open-site, presetopen_site)
dc.registerpreset(login-site, presetlogin_site)
运行预设
dc.run_preset(open-site)
dc.run_preset(login-site)
PY
工作流(DSL)示例
bash
python - <controllinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
steps = [
{action: open_chrome, url: http://localhost:8000/login, wait: 15},
{action: login_form, email: user@example.com, password: secret, wait: 10},
{action: open_url, url: http://localhost:8000/target, wait: 15},
{action: screenshot, path: /tmp/target.png}
]
dc.run_steps(steps)
PY
OCR与状态检测示例
bash
python - <controllinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
从屏幕读取文本
text = dc.readtexton_screen()
print(text)
等待文本出现(需要pytesseract)
if dc.waitfortext(Success, timeout=30):
print(检测到文本!)
PY
多显示器示例
bash
python - <controllinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
获取所有显示器
monitors = dc.get_monitors()
print(monitors) # [{name: HDMI-1, x: 0, y: 0, width: 1920, height: 1080}, ...]
在第二个显示器上点击(相对坐标0.5, 0.5 = 中心)
dc.click_monitor(1, 0.5, 0.5)
PY
多浏览器示例
bash
python - <controllinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
打开不同浏览器
dc.openfirefox(https://google.com, waitseconds=15)
dc.openedge(https://github.com, waitseconds=15)
PY
窗口管理器示例
bash
python - <controllinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
调整窗口大小为800x600
dc.resize_window(Chrome, 800, 600)
最小化窗口
dc.minimize_window(Telegram)
最大化窗口
dc.maximize_window(VSCode)
PY
流程录制器示例
bash
python - <controllinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
开始录制
dc.start_recording()
执行一些操作(目前手动,或封装它们)
dc.click(x=100, y=200)
dc.type_text(hello)
dc.press(enter)
停止并回放
actions = dc.stop_recording()
print(f录制了 {len(actions)} 个操作)
稍后回放
dc.replayactions(actions, delaymultiplier=1.0)
PY
AI视觉与智能等待示例
bash
python - <controllinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
按颜色查找元素(RGB)
pos = dc.findelementby_color((255, 0, 0), tolerance=20) # 红色
if pos:
dc.click(x=pos[0], y=pos[1])
智能等待 - 轮询直到条件为真
dc.smartwait(lambda: dc.activewindow_contains(Done), timeout=30)
PY
拖放示例
bash
python - <controllinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
从A点拖到B点
dc.drag_drop(100, 200, 500, 600)
将文件拖到应用
dc.dragfileto_app(/path/to/file.txt, 400, 300)
PY
健壮重试示例
bash
python - <controllinux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
带自动重试的点击
dc.robust_click(100, 200)
带自动重试的输入
dc.robust_type(Hello world)
PY
API
与DesktopController相同的接口:
- - 鼠标:movemouse、click、drag、scroll、getmouseposition
- 键盘:typetext、press、hotkey、wait、launchapp、openurl、openchrome、waitretrywindow、waitretrynewwindow、smartretry
- 屏幕/UI:clickimage、clickimageor、loginform
- 状态:ensurewindow、activewindowcontains、waitfortext、detectstate
- 恢复:recoverreload、recoverback、retrywithrecovery
- 工作流:runsteps
- 预设:registerpreset、runpreset
- OCR:readtextonscreen
- 多显示器:getmonitors、clickmonitor
- 健壮操作:robustclick、robusttype
- 智能等待:smartwait、waitforwindowstable
- 拖放:dragdrop、dragfiletoapp
- 窗口管理器:resizewindow、minimizewindow、maximizewindow
- 多浏览器:openfirefox、openedge
- 键盘:detectkeyboardlayout
- AI视觉:findelementbycolor、findbuttonvision
- 录制器:startrecording、recordaction、stoprecording、replay_actions
launchapp(appname, waitseconds=15, windowtitle=None, autodetectwindow=True)
- - 如果提供了window_title:等待15秒,重试一次,如果未找到则报错。
- 如果