Control Browser's kernel for web automation. Supports web navigation, element interaction, page scrolling, file/video downloading, and content extraction.
Based on the Browser, providing comprehensive browser automation capabilities.
Installation (one-time only)
Install QQ Browser and the x5use Python package.
CODEBLOCK0
Setup (run before each session)
Start the X5 background service on port 18009. Must be called after Installation. If the service is already running, it exits immediately without restarting.
CODEBLOCK1
Commands
Navigation
CODEBLOCK2
Element interaction
CODEBLOCK3
Scrolling
CODEBLOCK4
Download
CODEBLOCK5
Content
CODEBLOCK6
Wait
CODEBLOCK7
Core workflow
1. Navigate: INLINECODE0
Read result: Check the returned interactive elements with refs like [0], INLINECODE2
Interact: Use index from the result to click, fill, select, etc.
Re-read result: After navigation or interaction, check new interactive elements
Return value
Every command returns the current page state, including action result and interactive elements.
Structure
Action Result
- Success or Failed status
Target URL and Content-Type
Page Content
Field
Description
Previous page
Title and URL of the previous page
Action
Action name and parameters |
| Action Result | Execution result (e.g. navigation triggered) |
| Current page | Title and URL of the current page |
| Interactive elements | All interactive elements in the viewport, each with [index]<tag text/> |
Example output
Navigating to Baidu:
CODEBLOCK8
CODEBLOCK9
Interactive element format
Each element: [index]<tag text/>
Part
Description
Example
INLINECODE6
Element index for click_element, input_text, etc.
INLINECODE9
INLINECODE10
HTML element type (a, button, textarea, div, img, span) | <textarea> |
| text | Display text (may be empty) | 百度一下 |
Example: Search on Baidu
CODEBLOCK10
Example: Scroll and download
CODEBLOCK11
Troubleshooting
- If an element is not found, use the returned interactive elements list to find the correct index.
If the page is not fully loaded, add a wait.py command after navigation.
- Initial release of automation_browser skill for web automation using QQ Browser's X5 kernel.
- Supports navigation, element interaction (click, input, dropdown), page scrolling, and file/video download.
- Provides get_content to extract page data as Markdown and lists interactive elements for scripted actions.
- Includes setup and installation scripts for smooth environment initialization.
- Each script command returns detailed page state and interactive elements for further automation.