ESP32-CAM Eyes
Give your OpenClaw agent physical eyes using ESP32-S3-CAM modules.
Overview
Each ESP32-CAM module runs a lightweight HTTP server exposing /capture (single JPEG snapshot) and /stream (MJPEG live stream). Once connected to WiFi, the agent can grab images via curl for vision analysis.
Prerequisites
- - Hardware: ESP32-S3 development board with camera sensor (Hiwonder, Freenove, or similar)
- Software: macOS or Linux with Python 3 installed
- Tools: PlatformIO CLI (
pip3 install platformio), pyserial (pip3 install pyserial)
Quick Start
- 1. Plug in the ESP32-CAM via USB
- Identify the serial port:
ls /dev/cu.usb* (macOS) or ls /dev/ttyUSB* (Linux) - Identify the sensor model (critical — determines firmware config)
- Create PlatformIO project, flash firmware
- Connect to WiFi, test with INLINECODE7
For the complete step-by-step guide with firmware code, pin definitions, performance benchmarks, and troubleshooting: read references/setup-guide.md.
Key Decision: Sensor Type
The sensor model determines your firmware strategy:
| Sensor | PID | Hardware JPEG | Recommended Format |
|---|
| OV2640 | 0x2640 | ✅ Yes | INLINECODE8 directly |
| OV5640 |
0x5640 | ✅ Yes |
PIXFORMAT_JPEG directly |
| GC2145 | 0x2145 | ❌ No |
PIXFORMAT_RGB565 + software
frame2jpg() |
If buying new boards, prefer OV2640 — hardware JPEG is significantly faster.
API Endpoints
Once flashed and connected:
| Path | Function |
|---|
| INLINECODE12 | Single JPEG snapshot |
| INLINECODE13 |
MJPEG live stream |
|
/ | Web UI with stream viewer |
Multi-Camera Deployment
Multiple ESP32-CAMs can join the same WiFi network for multi-angle coverage. Bind fixed IPs via router DHCP reservation to avoid IP changes on reboot.
Common Pitfalls
- - Wrong sensor ID: Always verify PID before choosing firmware config
- Upload speed: Use 460800 baud, not 921600 (causes flash verification failures on many boards)
- WiFi band: ESP32 only supports 2.4GHz — ensure your router has a 2.4GHz SSID available
- QQVGA is slower than VGA: Counter-intuitive but true due to PSRAM DMA buffer efficiency; use XGA (1024×768) for best speed/quality balance
ESP32-CAM 眼睛
使用ESP32-S3-CAM模块为您的OpenClaw智能体赋予物理眼睛。
概述
每个ESP32-CAM模块运行一个轻量级HTTP服务器,提供/capture(单张JPEG快照)和/stream(MJPEG实时流)接口。连接WiFi后,智能体可通过curl抓取图像进行视觉分析。
前置条件
- - 硬件:带摄像头传感器的ESP32-S3开发板(Hiwonder、Freenove或同类产品)
- 软件:已安装Python 3的macOS或Linux系统
- 工具:PlatformIO CLI(pip3 install platformio)、pyserial(pip3 install pyserial)
快速开始
- 1. 通过USB连接ESP32-CAM
- 识别串口:ls /dev/cu.usb(macOS)或ls /dev/ttyUSB(Linux)
- 识别传感器型号(关键——决定固件配置)
- 创建PlatformIO项目,烧录固件
- 连接WiFi,使用curl -o photo.jpg http:///capture测试
完整的分步指南(含固件代码、引脚定义、性能基准测试和故障排除)请阅读:references/setup-guide.md。
关键决策:传感器类型
传感器型号决定您的固件策略:
| 传感器 | PID | 硬件JPEG | 推荐格式 |
|---|
| OV2640 | 0x2640 | ✅ 支持 | 直接使用PIXFORMATJPEG |
| OV5640 |
0x5640 | ✅ 支持 | 直接使用PIXFORMATJPEG |
| GC2145 | 0x2145 | ❌ 不支持 | PIXFORMAT_RGB565 + 软件frame2jpg() |
如果购买新板,优先选择OV2640——硬件JPEG处理速度显著更快。
API端点
烧录并连接后:
| 路径 | 功能 |
|---|
| /capture | 单张JPEG快照 |
| /stream |
MJPEG实时流 |
| / | 带流查看器的Web界面 |
多摄像头部署
多个ESP32-CAM可加入同一WiFi网络实现多角度覆盖。通过路由器DHCP预留绑定固定IP,避免重启后IP变化。
常见陷阱
- - 错误的传感器ID:选择固件配置前务必验证PID
- 上传速度:使用460800波特率,而非921600(后者会导致许多板子的闪存验证失败)
- WiFi频段:ESP32仅支持2.4GHz——确保路由器有可用的2.4GHz SSID
- QQVGA比VGA更慢:由于PSRAM DMA缓冲区效率问题,这反直觉但属实;使用XGA(1024×768)可获得最佳速度/质量平衡