Desktop Control via CUA Server
This skill allows OpenClaw to control the desktop using the CUA computer server API.
⚠️ Security Notice
This skill requires installing and running a third-party server (cua-computer-sdk) that has full control over your desktop.
Before using this skill:
- - The server can simulate keyboard, mouse, and take screenshots
- Only run on systems where you trust all users and processes
- The server runs with your user privileges (no sudo/admin required)
- By default, only accessible from localhost (safe for local use)
Prerequisites
- - Python 3.12+ installed on your system
- CUA computer server running on port 8000 (see installation below)
- Access to localhost:8000 only (network exposure not recommended)
Installation
Recommended: Temporary Session (Safest)
Run the server only when needed, in a terminal you can monitor:
CODEBLOCK0
This is the safest approach - the server only runs when you explicitly start it and stops when you close the terminal.
Alternative: Install from Source
For transparency, you can review and run from source:
CODEBLOCK1
Running the Server
Option 1: Manual Start (Recommended)
CODEBLOCK2
Option 2: Background Process (Temporary)
CODEBLOCK3
Note: This skill does NOT require persistent/system service installation. Running the server temporarily when needed is the recommended approach.
Scope & Limitations
This skill:
- - ✅ Controls YOUR desktop when the server is running
- ✅ Runs with YOUR user privileges (no admin/sudo needed)
- ✅ Only accessible from localhost by default
Security Best Practices
- 1. Run Temporarily: Start the server only when needed, stop when done
- Localhost Only: Keep default binding to 127.0.0.1
- No Network Exposure: Avoid
--bind 0.0.0.0 unless absolutely necessary - Monitor Activity: Run in foreground to see what commands are executed
- Limited Scope: The server can only do what your user account can do
Quick Test
After starting the server, verify it works:
CODEBLOCK4
Troubleshooting
Port Already in Use:
CODEBLOCK5
Permission Denied (Linux):
CODEBLOCK6
Display Not Found (Linux):
CODEBLOCK7
Server Not Responding:
CODEBLOCK8
Available Commands
Take Screenshot
Capture the current screen:
CODEBLOCK9
Click at Coordinates
Click at specific x,y coordinates:
CODEBLOCK10
Right Click
CODEBLOCK11
Double Click
CODEBLOCK12
Type Text
Type text at the current cursor position:
CODEBLOCK13
Press Hotkey
Press a key combination:
CODEBLOCK14
Press Single Key
Press a single key:
CODEBLOCK15
Move Cursor
Move cursor to specific position:
CODEBLOCK16
Scroll
Scroll up or down:
CODEBLOCK17
Launch Application
Launch an application by name:
CODEBLOCK18
Open File or URL
Open a file or URL with default application:
CODEBLOCK19
Get Window Information
Get current window ID:
CODEBLOCK20
Window Control
Maximize window:
CODEBLOCK21
Minimize window:
CODEBLOCK22
Demo Workflows
Browser Navigation Demo
Open Firefox and navigate to a website:
CODEBLOCK23
Text Editor Demo
Open text editor and type content:
CODEBLOCK24
Form Filling Demo
Fill out a web form:
CODEBLOCK25
Helper Functions
Check Server Status
CODEBLOCK26
List All Available Commands
CODEBLOCK27
Get Screen Size
CODEBLOCK28
Get Cursor Position
CODEBLOCK29
Environment Variables
- -
CUA_SERVER_URL: Base URL for CUA server (default: http://localhost:8000)
Tips
- 1. Wait Between Commands: Add
sleep between commands to allow UI to update - Check Coordinates: Screen is 1280x720, center is at (640, 360)
- Screenshot for Debugging: Take screenshots before and after actions to verify
- Use Variables: Store coordinates and text in variables for reusability
Example OpenClaw Usage
Once this skill is loaded, you can use it in OpenClaw conversations:
CODEBLOCK30
Troubleshooting
- 1. Connection Refused: Make sure CUA server is running on port 8000
- No Response: Check if you're in the container or have SSH tunnel set up
- Commands Not Working: Verify with INLINECODE4
- Wrong Coordinates: Remember screen is 1280x720, adjust coordinates accordingly
通过CUA服务器控制桌面
此技能允许OpenClaw使用CUA计算机服务器API控制桌面。
⚠️ 安全须知
此技能需要安装并运行一个第三方服务器(cua-computer-sdk),该服务器对您的桌面拥有完全控制权。
使用此技能前请注意:
- - 该服务器可以模拟键盘、鼠标操作并截取屏幕截图
- 仅可在您信任所有用户和进程的系统上运行
- 服务器以您的用户权限运行(无需sudo/管理员权限)
- 默认情况下,仅可从本地主机访问(本地使用安全)
前提条件
- - 系统已安装Python 3.12+
- CUA计算机服务器在8000端口运行(参见下方安装说明)
- 仅可访问localhost:8000(不建议暴露到网络)
安装
推荐:临时会话(最安全)
仅在需要时运行服务器,并在可监控的终端中运行:
bash
安装计算机SDK(官方CUA包)
pip install cua-computer-sdk
验证包(可选但推荐)
pip show cua-computer-sdk # 检查发布者和版本
临时运行(按Ctrl+C停止)
cua-server start --port 8000 --bind 127.0.0.1
在另一个终端中,验证它仅在本地运行
curl http://localhost:8000/status
netstat -an | grep 8000 # 应显示127.0.0.1:8000
这是最安全的方法 - 服务器仅在您明确启动时运行,关闭终端时停止。
备选:从源码安装
为保持透明,您可以审查并从源码运行:
bash
先克隆并审查代码
git clone https://github.com/trycua/cua-computer-server
cd cua-computer-server
运行前审查代码
ls -la
cat requirements.txt # 检查依赖
安装并运行
pip install -r requirements.txt
python -m cua_server --port 8000 --bind 127.0.0.1
运行服务器
选项1:手动启动(推荐)
bash
在前台启动 - 您可以看到它在做什么
cua-server start --port 8000
完成后按Ctrl+C停止
选项2:后台进程(临时)
bash
仅在当前会话中后台运行
cua-server start --port 8000 &
记下进程ID
echo 服务器PID: $!
完成后停止
kill
注意: 此技能不需要持久化/系统服务安装。推荐在需要时临时运行服务器。
范围与限制
此技能:
- - ✅ 在服务器运行时控制您的桌面
- ✅ 以您的用户权限运行(无需管理员/sudo权限)
- ✅ 默认仅可从本地主机访问
安全最佳实践
- 1. 临时运行:仅在需要时启动服务器,完成后停止
- 仅限本地主机:保持默认绑定到127.0.0.1
- 不暴露网络:除非绝对必要,避免使用--bind 0.0.0.0
- 监控活动:在前台运行以查看执行的命令
- 有限范围:服务器只能执行您的用户账户可以执行的操作
快速测试
启动服务器后,验证其工作状态:
bash
简单健康检查
curl http://localhost:8000/status
应返回:{status: ok}
截取屏幕截图(安全测试)
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: screenshot} \
-o screenshot.json
如果成功,您将获得包含base64图像数据的JSON响应
故障排除
端口已被占用:
bash
检查什么在使用8000端口
lsof -i :8000 # macOS/Linux
netstat -ano | findstr :8000 # Windows
解决方案:使用不同端口
cua-server start --port 8001
权限被拒绝(Linux):
bash
您可能需要将用户添加到input组以控制键盘/鼠标
sudo usermod -a -G input $USER
注销并重新登录以使更改生效
未找到显示(Linux):
bash
检查您的显示变量
echo $DISPLAY
明确设置
DISPLAY=:0 cua-server start --port 8000
服务器无响应:
bash
检查进程是否在运行
ps aux | grep cua-server # Linux/macOS
tasklist | findstr cua-server # Windows
尝试在前台运行以查看错误
cua-server start --port 8000 --debug
可用命令
截取屏幕截图
捕获当前屏幕:
bash
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: screenshot} \
| jq -r .result.base64 \
| base64 -d > screenshot.png
在坐标处点击
在特定x,y坐标处点击:
bash
在1280x720屏幕中心点击
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: left_click, params: {x: 640, y: 360}}
右键点击
bash
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: right_click, params: {x: 640, y: 360}}
双击
bash
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: double_click, params: {x: 640, y: 360}}
输入文本
在当前光标位置输入文本:
bash
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: type_text, params: {text: Hello, World!}}
按下快捷键
按下组合键:
bash
Ctrl+C
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: hotkey, params: {keys: [ctrl, c]}}
Ctrl+Alt+T(打开终端)
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: hotkey, params: {keys: [ctrl, alt, t]}}
按下单个键
按下单个键:
bash
按下回车
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: press_key, params: {key: enter}}
按下Esc
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: press_key, params: {key: escape}}
移动光标
将光标移动到特定位置:
bash
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: move_cursor, params: {x: 100, y: 200}}
滚动
向上或向下滚动:
bash
向下滚动3个单位
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: scroll_direction, params: {direction: down, amount: 3}}
向上滚动5个单位
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: scroll_direction, params: {direction: up, amount: 5}}
启动应用程序
按名称启动应用程序:
bash
启动Firefox
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: launch, params: {app: firefox}}
启动终端
curl -X POST http://localhost:8000/cmd \
-H Content-Type: application/json \
-d {command: launch, params: {app: xfce4-terminal}}
打开文件或URL
使用默认应用程序打开文件或URL