Web + Desktop Automation
Use this skill when a task may involve:
- - Opening or controlling websites
- Reading or extracting page content
- Filling forms, clicking buttons, logging in
- Downloading or uploading files
- Controlling desktop apps with mouse/keyboard
- Combining browser steps with local app steps
Core rule
Prefer the simplest reliable path:
- 1. If the task can be done in the browser, use browser automation.
- If the task needs local apps or OS-level interaction, use desktop automation.
- If both are needed, split the job into clear phases and verify after each phase.
Execution strategy
1) Classify the task
Decide which of these applies:
- - Browser only
- Desktop only
- Mixed browser + desktop
2) Browser automation
Use browser automation for:
- - Navigation
- Search
- Page reading
- Form filling
- Clicking controls
- File upload/download
- Logged-in web workflows
Prefer stable selectors and explicit waits. Avoid brittle coordinate-based clicking when browser selectors exist.
3) Desktop automation
Use desktop automation for:
- - Native apps
- Window switching
- Copy/paste between apps
- File manager operations
- UI flows outside the browser
Prefer application/window-aware methods when available. Use image-based or coordinate-based control only when necessary.
4) Mixed workflows
Break the task into phases:
- - Browser phase
- Desktop phase
- Browser phase again if needed
After each phase, verify the result before continuing.
Recovery rules
If a step fails:
- 1. Re-check the current UI state
- Re-locate the target element or window
- Try a more stable selector or a different interaction method
- If the task risks loss of data or irreversible action, stop and ask the user
Best practices
- - Prefer deterministic steps over guessing
- Avoid rapid blind retries
- Capture key state when tasks are long or fragile
- Keep flows small and modular
- Use scripts for repeated actions
- Use
scripts/browser_runner.py for Playwright browser automation templates - Use
scripts/desktop_runner.py for PyAutoGUI desktop automation templates - Use
scripts/mixed_orchestrator.py for browser + desktop handoffs - Put browser-specific patterns in INLINECODE3
- Put desktop-specific patterns in INLINECODE4
- Put mixed-flow orchestration examples in INLINECODE5
- Put dependency and installation notes in INLINECODE6
- Put a realistic browser-download → desktop-edit → browser-upload flow in INLINECODE7
- See
requirements.txt for a minimal install set - Put dependency and installation notes in INLINECODE9
- Put a realistic browser-download → desktop-edit → browser-upload flow in INLINECODE10
- Put dependency and installation notes in INLINECODE11
技能名称: web-desktop-automation
详细描述:
Web + 桌面自动化
当任务可能涉及以下内容时使用此技能:
- - 打开或控制网站
- 读取或提取页面内容
- 填写表单、点击按钮、登录
- 下载或上传文件
- 使用鼠标/键盘控制桌面应用
- 将浏览器步骤与本地应用步骤结合
核心规则
优先选择最简单可靠的路径:
- 1. 如果任务可在浏览器中完成,则使用浏览器自动化。
- 如果任务需要本地应用或操作系统级交互,则使用桌面自动化。
- 如果两者都需要,则将工作拆分为清晰的阶段,并在每个阶段后进行验证。
执行策略
1) 任务分类
判断属于以下哪种情况:
2) 浏览器自动化
使用浏览器自动化处理:
- - 导航
- 搜索
- 页面读取
- 表单填写
- 点击控件
- 文件上传/下载
- 已登录的网页工作流
优先使用稳定的选择器和显式等待。当存在浏览器选择器时,避免使用脆弱的基于坐标的点击。
3) 桌面自动化
使用桌面自动化处理:
- - 原生应用
- 窗口切换
- 应用间复制/粘贴
- 文件管理器操作
- 浏览器之外的界面流程
优先使用支持应用/窗口感知的方法。仅在必要时使用基于图像或坐标的控制。
4) 混合工作流
将任务拆分为多个阶段:
- - 浏览器阶段
- 桌面阶段
- 如有需要,再次进入浏览器阶段
每个阶段结束后,先验证结果再继续。
恢复规则
如果某一步骤失败:
- 1. 重新检查当前界面状态
- 重新定位目标元素或窗口
- 尝试更稳定的选择器或不同的交互方式
- 如果任务存在数据丢失或不可逆操作的风险,则停止并向用户询问
最佳实践
- - 优先使用确定性步骤而非猜测
- 避免快速盲目重试
- 当任务耗时较长或较脆弱时,捕获关键状态
- 保持流程短小且模块化
- 对重复操作使用脚本
- 使用 scripts/browserrunner.py 获取 Playwright 浏览器自动化模板
- 使用 scripts/desktoprunner.py 获取 PyAutoGUI 桌面自动化模板
- 使用 scripts/mixed_orchestrator.py 处理浏览器与桌面的交接
- 将浏览器特定模式放入 references/browser-workflows.md
- 将桌面特定模式放入 references/desktop-workflows.md
- 将混合流程编排示例放入 references/mixed-flows.md
- 将依赖项和安装说明放入 references/dependencies.md
- 将真实的浏览器下载→桌面编辑→浏览器上传流程放入 references/mixed-example.md
- 查看 requirements.txt 获取最小安装集
- 将依赖项和安装说明放入 references/dependencies.md
- 将真实的浏览器下载→桌面编辑→浏览器上传流程放入 references/mixed-example.md
- 将依赖项和安装说明放入 references/dependencies.md