Web + Desktop Automation

Use this skill when a task may involve:

- Opening or controlling websites
Reading or extracting page content
Filling forms, clicking buttons, logging in
Downloading or uploading files
Controlling desktop apps with mouse/keyboard
Combining browser steps with local app steps

Core rule

Prefer the simplest reliable path:

1. If the task can be done in the browser, use browser automation.
If the task needs local apps or OS-level interaction, use desktop automation.
If both are needed, split the job into clear phases and verify after each phase.

Execution strategy

1) Classify the task

Decide which of these applies:

- Browser only
Desktop only
Mixed browser + desktop

2) Browser automation

Use browser automation for:

- Navigation
Search
Page reading
Form filling
Clicking controls
File upload/download
Logged-in web workflows

Prefer stable selectors and explicit waits. Avoid brittle coordinate-based clicking when browser selectors exist.

3) Desktop automation

Use desktop automation for:

- Native apps
Window switching
Copy/paste between apps
File manager operations
UI flows outside the browser

Prefer application/window-aware methods when available. Use image-based or coordinate-based control only when necessary.

4) Mixed workflows

Break the task into phases:

- Browser phase
Desktop phase
Browser phase again if needed

After each phase, verify the result before continuing.

Recovery rules

If a step fails:

1. Re-check the current UI state
Re-locate the target element or window
Try a more stable selector or a different interaction method
If the task risks loss of data or irreversible action, stop and ask the user

Best practices

- Prefer deterministic steps over guessing
Avoid rapid blind retries
Capture key state when tasks are long or fragile
Keep flows small and modular
Use scripts for repeated actions
Use scripts/browser_runner.py for Playwright browser automation templates
Use scripts/desktop_runner.py for PyAutoGUI desktop automation templates
Use scripts/mixed_orchestrator.py for browser + desktop handoffs
Put browser-specific patterns in INLINECODE3
Put desktop-specific patterns in INLINECODE4
Put mixed-flow orchestration examples in INLINECODE5
Put dependency and installation notes in INLINECODE6
Put a realistic browser-download → desktop-edit → browser-upload flow in INLINECODE7
See requirements.txt for a minimal install set
Put dependency and installation notes in INLINECODE9
Put a realistic browser-download → desktop-edit → browser-upload flow in INLINECODE10
Put dependency and installation notes in INLINECODE11

技能名称: web-desktop-automation
详细描述:

Web + 桌面自动化

当任务可能涉及以下内容时使用此技能：

- 打开或控制网站
读取或提取页面内容
填写表单、点击按钮、登录
下载或上传文件
使用鼠标/键盘控制桌面应用
将浏览器步骤与本地应用步骤结合

核心规则

优先选择最简单可靠的路径：

1. 如果任务可在浏览器中完成，则使用浏览器自动化。
如果任务需要本地应用或操作系统级交互，则使用桌面自动化。
如果两者都需要，则将工作拆分为清晰的阶段，并在每个阶段后进行验证。

执行策略

1) 任务分类

判断属于以下哪种情况：

- 仅浏览器
仅桌面
浏览器+桌面混合

2) 浏览器自动化

使用浏览器自动化处理：

- 导航
搜索
页面读取
表单填写
点击控件
文件上传/下载
已登录的网页工作流

优先使用稳定的选择器和显式等待。当存在浏览器选择器时，避免使用脆弱的基于坐标的点击。

3) 桌面自动化

使用桌面自动化处理：

- 原生应用
窗口切换
应用间复制/粘贴
文件管理器操作
浏览器之外的界面流程

优先使用支持应用/窗口感知的方法。仅在必要时使用基于图像或坐标的控制。

4) 混合工作流

将任务拆分为多个阶段：

- 浏览器阶段
桌面阶段
如有需要，再次进入浏览器阶段

每个阶段结束后，先验证结果再继续。

恢复规则

如果某一步骤失败：

1. 重新检查当前界面状态
重新定位目标元素或窗口
尝试更稳定的选择器或不同的交互方式
如果任务存在数据丢失或不可逆操作的风险，则停止并向用户询问

最佳实践

- 优先使用确定性步骤而非猜测
避免快速盲目重试
当任务耗时较长或较脆弱时，捕获关键状态
保持流程短小且模块化
对重复操作使用脚本
使用 scripts/browserrunner.py 获取 Playwright 浏览器自动化模板
使用 scripts/desktoprunner.py 获取 PyAutoGUI 桌面自动化模板
使用 scripts/mixed_orchestrator.py 处理浏览器与桌面的交接
将浏览器特定模式放入 references/browser-workflows.md
将桌面特定模式放入 references/desktop-workflows.md
将混合流程编排示例放入 references/mixed-flows.md
将依赖项和安装说明放入 references/dependencies.md
将真实的浏览器下载→桌面编辑→浏览器上传流程放入 references/mixed-example.md
查看 requirements.txt 获取最小安装集
将依赖项和安装说明放入 references/dependencies.md
将真实的浏览器下载→桌面编辑→浏览器上传流程放入 references/mixed-example.md
将依赖项和安装说明放入 references/dependencies.md

web-desktop-automation网页桌面自动化