OpenClaw VLN Planner

Use this skill when the user wants a robot to follow a natural-language navigation instruction from visual observations.

This skill is a high-level navigation planner. It does not produce motor, joint, torque, or trajectory control. It only produces one structured mid-level navigation action at a time.

When this skill triggers

Trigger this skill when the task includes one or more of the following:

- Vision-language navigation (VLN)
Robot next-step planning from camera images
Closed-loop navigation with replanning after each observation
Converting a current frame plus historical frames into a single next navigation action
Sending current + history images to an OpenAI-compatible multimodal gateway for action prediction

Required inputs

The planner expects:

- user_instruction: natural-language navigation instruction
INLINECODE1: exactly one current image
INLINECODE2: zero or more previous images in temporal order

Optional inputs:

- robot_state: heading, speed, pose estimate, execution feedback, etc.
INLINECODE4: blocked, collisionrisk, lost, targetreached, low_visibility, etc.
INLINECODE5: path to the runtime config file

Output contract

Output must be pure JSON only. Do not prepend or append prose.

Allowed action types only:

- INLINECODE6
INLINECODE7
INLINECODE8
INLINECODE9

Expected JSON shape:

CODEBLOCK0

Completion example:

CODEBLOCK1

Core rules

1. Plan only the next action.
Never output a full route.
Replan after each execution step.
If uncertain, unsafe, blocked, unable to parse, or visually ambiguous, output STOP.
Enforce action bounds:

- MOVE_FORWARD: 10-150 cm - TURN_LEFT: 5-90 deg - TURN_RIGHT: 5-90 deg - STOP: no value/unit required

6. If safety_flags.target_reached == true, output STOP with task_status = completed.
If blocked, collision_risk, lost, or severe uncertainty is present, prefer STOP.

Runtime configuration

Before running, load a YAML config file such as config/vln-config.yaml.

The config should define:

- subscribed or logical input topics / channels for current frame and history frame collection
optional robot state and safety flag sources
OpenAI-compatible multimodal gateway settings: base_url, api_key, INLINECODE25
planner behavior such as confidence threshold and safety fallback
executor bridge mode (default: Python function bridge)

Read references/navigation-schema.md for the expected config structure.

Internal module design

1) context builder

Build a model input payload from:

- user instruction
historical observations
current observation
optional robot state
optional safety flags

The prompt must explicitly separate:

- historical observations
current observation
user instruction

2) action planner

Call an OpenAI-compatible multimodal gateway with:

- one current image
historical images
planner prompt
optional structured context

The model should be asked to return pure JSON for exactly one next action.

3) action parser

Parse the model result as JSON.

If parsing fails:

- try safe extraction of the first JSON object
if still invalid, fall back to INLINECODE27

4) action validator

Validate:

- action type is one of the four allowed values
distance and angle ranges are legal
unit matches action type
confidence is numeric if present
task_status is one of in_progress, completed, INLINECODE30

Any invalid output falls back to STOP.

5) executor bridge

Forward the validated mid-level action to a separate execution layer.

Reserved Python bridge interface:

- INLINECODE32
INLINECODE33
INLINECODE34
INLINECODE35
INLINECODE36
INLINECODE37

Do not hardcode a robot SDK into the planner logic.

6) replanning loop

Use the planner in a closed loop:

1. gather current frame + history frames
gather optional robot state / safety flags
call multimodal planner
parse and validate JSON action
execute through bridge
observe again
repeat until task_status = completed or forced stop

7) safety fallback

Always stop on:

- parse failure
invalid action
confidence below threshold
blocked / collision risk / lost / target reached
missing visual evidence for safe motion

Prompt template

Use this prompt pattern:

CODEBLOCK2

Example user requests

- "Go down the hallway and stop at the blue door."
"Move to the kitchen entrance."
"Find the end of the corridor and stop."
"Turn right at the next intersection and continue."

Failure handling

If anything is wrong with the output, return:

CODEBLOCK3

Bundled resources

- references/navigation-schema.md: schema, bounds, safety fallback, examples, config contract
INLINECODE40: example OpenAI-compatible multimodal planner + Python executor bridge
INLINECODE41: Python dependencies
INLINECODE42: runtime config template

OpenClaw VLN 规划器

当用户希望机器人根据视觉观测遵循自然语言导航指令时，使用此技能。

此技能是一个高级导航规划器。它不产生电机、关节、扭矩或轨迹控制。它每次只产生一个结构化的中级导航动作。

触发条件

当任务包含以下一项或多项时触发此技能：

- 视觉语言导航（VLN）
基于摄像头图像的机器人下一步规划
每次观测后重新规划的闭环导航
将当前帧和历史帧转换为单个下一步导航动作
将当前图像和历史图像发送到兼容OpenAI的多模态网关进行动作预测

必需输入

规划器需要：

- userinstruction：自然语言导航指令
currentframe：恰好一张当前图像
history_frames：按时间顺序的零张或多张历史图像

可选输入：

- robotstate：朝向、速度、位姿估计、执行反馈等
safetyflags：阻塞、碰撞风险、丢失、目标到达、低能见度等
config_path：运行时配置文件路径

输出约定

输出必须仅为纯JSON。不要添加任何前言或后语。

仅允许的动作类型：

- MOVEFORWARD
TURNLEFT
TURN_RIGHT
STOP

预期的JSON格式：

json
{
next_action: {
type: MOVE_FORWARD,
value: 75,
unit: cm
},
taskstatus: inprogress,
confidence: 0.87,
notes: 沿着走廊继续前进
}

完成示例：

json
{
next_action: {
type: STOP
},
task_status: completed,
confidence: 0.93,
notes: 目标已到达
}

核心规则

1. 仅规划下一步动作。
绝不输出完整路线。
每一步执行后重新规划。
如果不确定、不安全、被阻塞、无法解析或视觉模糊，输出STOP。
强制执行动作范围：

- MOVE_FORWARD：10-150厘米 - TURN_LEFT：5-90度 - TURN_RIGHT：5-90度 - STOP：不需要值/单位

6. 如果safetyflags.targetreached == true，输出STOP且taskstatus = completed。
如果存在blocked、collisionrisk、lost或严重不确定性，优先输出STOP。

运行时配置

运行前，加载一个YAML配置文件，如config/vln-config.yaml。

配置应定义：

- 用于当前帧和历史帧采集的订阅或逻辑输入主题/通道
可选的机器人状态和安全标志来源
兼容OpenAI的多模态网关设置：baseurl、apikey、model_id
规划器行为，如置信度阈值和安全回退
执行器桥接模式（默认：Python函数桥接）

参考references/navigation-schema.md了解预期的配置结构。

内部模块设计

1) 上下文构建器

从以下内容构建模型输入负载：

- 用户指令
历史观测
当前观测
可选的机器人状态
可选的安全标志

提示必须明确区分：

- 历史观测
当前观测
用户指令

2) 动作规划器

调用兼容OpenAI的多模态网关，包含：

- 一张当前图像
历史图像
规划器提示
可选的结构化上下文

模型应被要求返回恰好一个下一步动作的纯JSON。

3) 动作解析器

将模型结果解析为JSON。

如果解析失败：

- 尝试安全提取第一个JSON对象
如果仍然无效，回退到STOP

4) 动作验证器

验证：

- 动作类型是四个允许值之一
距离和角度范围合法
单位与动作类型匹配
置信度（如果存在）为数值
taskstatus为inprogress、completed、failed之一

任何无效输出回退到STOP。

5) 执行器桥接

将验证后的中级动作转发到单独的执行层。

保留的Python桥接接口：

- executemoveforward(distancecm)
executeturnleft(angledeg)
executeturnright(angledeg)
executestop()
getrobotstate()
getsafetyflags()

不要将机器人SDK硬编码到规划器逻辑中。

6) 重新规划循环

在闭环中使用规划器：

1. 收集当前帧和历史帧
收集可选的机器人状态/安全标志
调用多模态规划器
解析和验证JSON动作
通过桥接执行
再次观测
重复直到task_status = completed或强制停止

7) 安全回退

在以下情况始终停止：

- 解析失败
无效动作
置信度低于阈值
阻塞/碰撞风险/丢失/目标到达
缺少安全移动的视觉证据

提示模板

使用此提示模式：

text
你是一个机器人导航规划器。
你将收到：

1. 历史观测
当前观测
用户指令
可选的机器人状态和安全标志

你的任务是决定机器人下一步的单个中级导航动作。
你只能输出以下动作之一：

- MOVEFORWARD，距离以厘米为单位
TURNLEFT，角度以度为单位
TURN_RIGHT，角度以度为单位
STOP

规则：

- 仅规划下一步，而不是整条路线。
如果目标已到达，输出STOP。
如果不确定、场景不清晰或存在任何安全风险，输出STOP。
MOVEFORWARD必须在10-150厘米之间。
TURNLEFT和TURN_RIGHT必须在5-90度之间。
仅输出纯JSON，不附带任何额外解释。

示例用户请求

- 沿着走廊走，在蓝色门前停下。
移动到厨房入口。
找到走廊尽头并停下。
在下一个路口右转并继续前进。

失败处理

如果输出有任何问题，返回：

json
{
next_action: {
type: STOP
},
task_status: failed,
confidence: 0.0,
notes: fallback_stop
}

捆绑资源

- references/navigation-schema.md：模式、范围、安全回退、示例、配置契约
scripts/vln_bridge.py：示例兼容OpenAI的多模态规划器 + Python执行器桥接
scripts/requirements.txt：Python依赖
config/vln-config.yaml：运行时配置模板

openclaw-vln-planner视觉语言导航规划