ROS 2 Engineering Skills
A progressive-disclosure skill for ROS 2 development — from first workspace to
production fleet deployment. Each section below gives you the essential decision
framework; detailed patterns, code templates, and anti-patterns live in the
references/ directory. Read the relevant reference file before writing code.
How to use this skill
- 1. Identify what the user is building (see Decision Router below).
- Read the matching
references/*.md file for detailed guidance. - Apply the Core Engineering Principles in every piece of code you generate.
- When multiple domains intersect (e.g. Nav2 + ros2_control), read both files.
Decision router
| User is doing... | Read |
|---|
| Creating a workspace, package, or build config | INLINECODE2 |
| Writing nodes, executors, callback groups |
references/nodes-executors.md |
| Topics, services, actions, custom interfaces, QoS |
references/communication.md |
| Lifecycle nodes, component loading, composition |
references/lifecycle-components.md |
| Launch files, conditional logic, event handlers |
references/launch-system.md |
| tf2, URDF, xacro, robot
statepublisher |
references/tf2-urdf.md |
| ros2_control, hardware interfaces, controllers |
references/hardware-interface.md |
| Real-time constraints, PREEMPT_RT, memory, jitter |
references/realtime.md |
| Nav2, SLAM, costmaps, behavior trees |
references/navigation.md |
| MoveIt 2, planning scene, grasp pipelines |
references/manipulation.md |
| Camera, LiDAR, PCL, cv_bridge, depth processing |
references/perception.md |
| Unit tests, integration tests, launch_testing, CI |
references/testing.md |
| ros2 doctor, tracing, profiling, rosbag2 |
references/debugging.md |
| Docker, cross-compile, fleet deployment, OTA |
references/deployment.md |
| Gazebo, Isaac Sim, sim-to-real, use
simtime |
references/simulation.md |
| SROS2, DDS security, certificates, supply chain |
references/security.md |
| micro-ROS, MCU/RTOS, XRCE-DDS, rclc |
references/micro-ros.md |
| Multi-robot fleet, Open-RMF, DDS discovery scale |
references/multi-robot.md |
| Message types, units, covariance, frame conventions |
references/message-types.md |
| ROS 1 migration, ros1_bridge, hybrid operation |
references/migration-ros1.md |
When a task spans multiple domains, read all relevant files and reconcile
conflicting recommendations by favoring safety, then determinism, then simplicity.
Cross-cutting concern — Security: Security is not isolated to references/security.md.
Every domain should consider its security implications: hardware interfaces need safe
shutdown on auth failure, DDS topics may need encryption, deployment images need supply
chain verification, and fleet communication must use TLS. When reviewing code in any
domain, check whether the data path crosses a trust boundary.
Core engineering principles
These apply to every ROS 2 artifact you produce, regardless of domain.
1. Distro awareness
Always ask which ROS 2 distribution the user targets. Key differences:
| Feature | Foxy (EOL) | Humble (LTS) | Jazzy (LTS) | Kilted (non-LTS) | Rolling |
|---|
| EOL | Jun 2023 (ended) | May 2027 | May 2029 | Nov 2025 | Rolling |
| Ubuntu |
20.04 | 22.04 | 24.04 | 24.04 | Latest |
| Default DDS | Fast DDS | Fast DDS | Fast DDS | Fast DDS | Fast DDS |
| Zenoh support | — | — | — | Tier 1 | Tier 1 |
| Type description support | No | No | Yes | Yes | Yes |
| Service introspection | No | No | Yes | Yes | Yes |
| EventsExecutor | No | No | Experimental | Stable (+ rclpy) | Stable (+ rclpy) |
| Default bag format | sqlite3 | sqlite3 | MCAP | MCAP | MCAP |
| ros2_control interface | N/A (separate) | 2.x | 4.x | 4.x | Latest |
| CMake recommendation | ament
targetdeps | ament
targetdeps | either | target
linklibs | target
linklibs |
When the user does not specify, default to the latest LTS (Jazzy).
Pin the exact distro in Dockerfile, CI, and documentation so builds are reproducible.
2. C++ vs Python decision
Choose the language based on the node's role, not personal preference.
Use rclcpp (C++) when:
- - The node sits in a control loop running ≥100 Hz
- Deterministic memory allocation matters (real-time path)
- The node is a hardware driver or controller plugin
- Intra-process zero-copy communication is required
Use rclpy (Python) when:
- - The node is orchestration, monitoring, or parameter management
- Rapid prototyping with frequent iteration
- Heavy use of ML frameworks (PyTorch, TensorFlow) that are Python-native
- The node does not sit in a latency-critical path
Mixed stacks are normal. A typical robot has C++ drivers/controllers and Python
orchestration/monitoring. Note: component_container (composition) only loads
C++ components via pluginlib. Python nodes run as separate processes, but can
share a launch file and communicate via zero-overhead intra-host DDS.
Intra-process communication works for any nodes sharing a process — not only
composable components. Any nodes instantiated in the same process with
use_intra_process_comms(true) can use zero-copy transfer.
3. Package structure conventions
Every package should follow this layout. Consistency across a workspace reduces
onboarding time and makes CI scripts portable.
CODEBLOCK0
Separate interface definitions into a *_interfaces package so downstream
packages can depend on interfaces without pulling in implementation.
4. Parameter discipline
- - Declare every parameter with a type, description, range, and default
in the node constructor — never use undeclared parameters.
- - Use
ParameterDescriptor with FloatingPointRange or INLINECODE28
for numeric bounds. The parameter server rejects out-of-range values at set time.
- - Group related parameters under a namespace prefix:
controller.kp,
controller.ki,
controller.kd.
- - Load defaults from a
config/params.yaml; allow launch-time overrides. - For dynamic reconfiguration, register a
set_parameters_callback and
validate new values atomically before accepting.
5. Error handling philosophy
- - Nodes must not silently swallow errors. Log at the appropriate severity,
then take a safe action (stop motion, request help, transition to error state).
- - Prefer lifecycle node error transitions over ad-hoc boolean flags.
- When calling a service, always handle the "service not available" and
"future timed out" cases explicitly.
- - For hardware drivers, distinguish transient errors (retry with backoff)
from fatal errors (transition to
FINALIZED and alert the operator).
6. Quality of Service defaults
Start from these profiles and adjust per use case:
| Use case | Reliability | Durability | History | Depth | Deadline | Lifespan |
|---|
| Sensor stream | BESTEFFORT | VOLATILE | KEEPLAST | 5 | — | — |
| Command velocity |
RELIABLE | VOLATILE | KEEP_LAST | 1 | 100 ms | 200 ms |
| Map (latched) | RELIABLE | TRANSIENT
LOCAL | KEEPLAST | 1 | — | — |
| Diagnostics | RELIABLE | VOLATILE | KEEP_LAST | 10 | — | — |
| Parameter events | RELIABLE | VOLATILE | KEEP_LAST | 1000| — | — |
| Action feedback | RELIABLE | VOLATILE | KEEP_LAST | 1 | — | — |
| Safety heartbeat | RELIABLE | VOLATILE | KEEP_LAST | 1 | 500 ms | 1 s |
QoS mismatches are the #1 cause of "I published but nobody receives."
Always check compatibility with ros2 topic info -v when debugging.
DEADLINE and LIFESPAN are critical for safety-critical systems. DEADLINE fires an
event when no message arrives within the specified period (detect stale data). LIFESPAN
discards messages older than the specified duration before delivery (prevent acting on
stale data). See references/communication.md section 9 for full API and examples.
7. Naming conventions
| Entity | Convention | Example |
|---|
| Package | INLINECODE37 | INLINECODE38 |
| Node |
snake_case |
joint_state_broadcaster |
| Topic |
/snake_case with ns |
/arm/joint_states |
| Service |
/snake_case |
/arm/set_mode |
| Action |
/snake_case |
/arm/follow_joint_trajectory |
| Parameter |
snake_case with dot ns |
controller.publish_rate |
| Frame |
snake_case |
base_link,
camera_optical |
| Interface |
PascalCase.msg/srv/action |
JointState.msg |
8. Thread safety and callbacks
- - A
MutuallyExclusiveCallbackGroup serializes its callbacks — safe for
shared state without locks, but limits throughput.
- - A
ReentrantCallbackGroup allows parallel execution — you must protect
shared state with
std::mutex (C++) or
threading.Lock (Python).
- - Calling a service from a callback: The service client must be in a
separate
MutuallyExclusiveCallbackGroup from the calling callback. Otherwise
the executor deadlocks — the callback waits for the response while the executor
cannot deliver it. Always use
async_send_request with a response callback;
never use
spin_until_future_complete inside an executor callback.
- - Never do blocking work (file I/O, long computation,
sleep) inside a
timer or subscription callback on the default executor. Offload to a
dedicated thread or use a
MultiThreadedExecutor with a reentrant group.
- - In rclcpp, prefer
std::shared_ptr<const MessageT> in subscription
callbacks to avoid unnecessary copies and enable zero-copy intra-process.
9. Lifecycle-first design
Default to lifecycle (managed) nodes for anything that owns resources:
hardware drivers, sensor pipelines, planners, controllers.
CODEBLOCK1
This gives the system manager (launch file, orchestrator, or operator) explicit
control over when resources are allocated, when the node starts processing,
and how it shuts down. It also makes error recovery predictable.
10. Build and CI hygiene
- - Use
colcon build --cmake-args -DCMAKE_BUILD_TYPE=RelWithDebInfo for
development;
Release for deployment.
- - Enable
-Wall -Wextra -Wpedantic and treat warnings as errors in CI. - Run
colcon test with --event-handlers console_cohesion+ so test
output groups by package.
- - Pin rosdep keys in
rosdep.yaml for reproducible dependency resolution. - Cache
/opt/ros/, .ccache/, and build//install/ in CI to cut build
times by 60–80%.
Common anti-patterns
| Anti-pattern | Why it hurts | Fix |
|---|
| Global variables for node state | Breaks composition, untestable | Store state as class members |
INLINECODE74 in main() for multi-node processes |
Starves other nodes | Use
MultiThreadedExecutor or component composition |
| Hardcoded topic names | Breaks reuse across robots | Use relative names + namespace remapping |
|
KEEP_ALL history with no bound | Memory grows unbounded on slow subscribers | Use
KEEP_LAST with explicit depth |
| Using
time.sleep() /
std::this_thread::sleep_for | Blocks the executor thread | Use
create_wall_timer or a dedicated thread |
| Monolithic launch file for everything | Unmanageable past 10 nodes | Compose launch files with
IncludeLaunchDescription |
| Skipping
package.xml dependencies | Builds locally, breaks CI and Docker | Declare every dependency explicitly |
| Publishing in constructor | Subscribers may not be ready, messages lost | Publish in
on_activate or after a short timer |
| Ignoring QoS compatibility | Silent communication failure | Match publisher/subscriber QoS or check with
ros2 topic info -v |
| Creating timers/subs in callbacks | Resource leak, unpredictable behavior | Create all entities in constructor or
on_configure |
| Synchronous service call in callback | Deadlocks the executor thread | Use
async_send_request with a callback or dedicated thread |
| Service client in same callback group as caller | Deadlocks even with async in
MultiThreadedExecutor | Put service client in a separate
MutuallyExclusiveCallbackGroup |
| No safe command on shutdown | Motors hold last velocity after node exits | Send zero-velocity in
on_deactivate AND destructor (see
references/hardware-interface.md) |
| Dynamic subscriptions with
StaticSingleThreadedExecutor | New subs are never picked up after
spin() | Use
SingleThreadedExecutor or
MultiThreadedExecutor for dynamic entities |
| CPU frequency governor left on
powersave/
ondemand | 10-100 ms latency spikes in RT path | Set
performance governor, disable turbo boost (see
references/realtime.md) |
Distro-specific migration notes
When upgrading between distributions, check these breaking changes first:
Foxy → Humble:
- - Complete API overhaul. Foxy packages require significant rework.
- INLINECODE100 was not bundled in Foxy — must be built separately.
- Lifecycle node API stabilized in Humble.
- Action server/client API changed significantly.
Humble → Jazzy:
- -
ros2_control API changed from 2.x to 4.x — export_state_interfaces() and
export_command_interfaces() are now auto-generated by the framework. Manual
overrides use
on_export_state_interfaces(). See
references/hardware-interface.md.
- - Handle
get_value() deprecated → use get_optional<T>() on LoanedStateInterface /
LoanedCommandInterface (controller side). Hardware interfaces use
set_state() /
get_state() /
set_command() /
get_command() helpers with fully qualified names.
- - All joints in
<ros2_control> tag must exist in the URDF. - Controller parameter loading changed — use
--param-file with spawner. - Default bag format changed from sqlite3 to MCAP. Use
storage_id='mcap'. - Default middleware changed internal config paths. Regenerate DDS profiles.
- INLINECODE117 schema changes —
recoveries_server renamed to behavior_server. - INLINECODE120 replaces
ROS_LOCALHOST_ONLY (values: LOCALHOST,
SUBNET,
OFF,
SYSTEM_DEFAULT).
- -
launch_ros actions have new parameter handling — test launch files explicitly.
Jazzy → Kilted (non-LTS):
- - Zenoh promoted to Tier 1 middleware —
rmw_zenoh is production-ready.
Install:
sudo apt install ros-kilted-rmw-zenoh-cpp, set
RMW_IMPLEMENTATION=rmw_zenoh_cpp. Supports router/peer/client modes.
- - EventsExecutor graduated from experimental — available in INLINECODE130
(no
experimental namespace). Also ported to rclpy.
- -
ament_target_dependencies() deprecated — use target_link_libraries() with
modern CMake targets (e.g.
rclcpp::rclcpp,
std_msgs::std_msgs__rosidl_typesupport_cpp).
- - Multi-bag replay support in
ros2 bag play. - Gazebo Ionic is the paired simulator (Harmonic was Jazzy; Ionic is the Kilted pairing).
ROS 1 → ROS 2:
- - See
references/migration-ros1.md for a step-by-step strategy.
Quick reference — ros2 CLI
CODEBLOCK2
ROS 2 工程技能
一个面向ROS 2开发的渐进式技能体系——从第一个工作空间到生产级车队部署。以下每个部分都为您提供关键的决策框架;详细的模式、代码模板和反模式则存放在references/目录中。在编写代码前,请先阅读相关的参考文件。
如何使用本技能
- 1. 识别用户正在构建的内容(参见下方的决策路由表)。
- 阅读匹配的references/*.md文件以获取详细指导。
- 在您生成的每一段代码中应用核心工程原则。
- 当多个领域交叉时(例如Nav2 + ros2_control),请阅读这两个文件。
决策路由表
| 用户正在做... | 请阅读 |
|---|
| 创建工作空间、包或构建配置 | references/workspace-build.md |
| 编写节点、执行器、回调组 |
references/nodes-executors.md |
| 话题、服务、动作、自定义接口、QoS | references/communication.md |
| 生命周期节点、组件加载、组合 | references/lifecycle-components.md |
| 启动文件、条件逻辑、事件处理程序 | references/launch-system.md |
| tf2、URDF、xacro、robot
statepublisher | references/tf2-urdf.md |
| ros2_control、硬件接口、控制器 | references/hardware-interface.md |
| 实时约束、PREEMPT_RT、内存、抖动 | references/realtime.md |
| Nav2、SLAM、代价地图、行为树 | references/navigation.md |
| MoveIt 2、规划场景、抓取管线 | references/manipulation.md |
| 相机、激光雷达、PCL、cv_bridge、深度处理 | references/perception.md |
| 单元测试、集成测试、launch_testing、CI | references/testing.md |
| ros2 doctor、追踪、性能分析、rosbag2 | references/debugging.md |
| Docker、交叉编译、车队部署、OTA | references/deployment.md |
| Gazebo、Isaac Sim、仿真到实机、use
simtime | references/simulation.md |
| SROS2、DDS安全、证书、供应链 | references/security.md |
| micro-ROS、MCU/RTOS、XRCE-DDS、rclc | references/micro-ros.md |
| 多机器人车队、Open-RMF、DDS发现规模 | references/multi-robot.md |
| 消息类型、单位、协方差、坐标系约定 | references/message-types.md |
| ROS 1迁移、ros1_bridge、混合操作 | references/migration-ros1.md |
当一个任务跨越多个领域时,请阅读所有相关文件,并通过优先考虑安全性、其次是确定性、最后是简洁性来协调相互冲突的建议。
横切关注点——安全性: 安全性并不仅限于references/security.md。每个领域都应考虑其安全影响:硬件接口需要在认证失败时安全关闭,DDS话题可能需要加密,部署镜像需要供应链验证,车队通信必须使用TLS。在审查任何领域的代码时,请检查数据路径是否跨越了信任边界。
核心工程原则
这些原则适用于您生成的每一个ROS 2制品,无论其领域如何。
1. 发行版感知
始终询问用户的目标ROS 2发行版。关键差异:
| 特性 | Foxy (已停止支持) | Humble (LTS) | Jazzy (LTS) | Kilted (非LTS) | Rolling |
|---|
| 停止支持日期 | 2023年6月 (已结束) | 2027年5月 | 2029年5月 | 2025年11月 | 滚动更新 |
| Ubuntu |
20.04 | 22.04 | 24.04 | 24.04 | 最新版 |
| 默认DDS | Fast DDS | Fast DDS | Fast DDS | Fast DDS | Fast DDS |
| Zenoh支持 | — | — | — | 一级支持 | 一级支持 |
| 类型描述支持 | 否 | 否 | 是 | 是 | 是 |
| 服务自省 | 否 | 否 | 是 | 是 | 是 |
| EventsExecutor | 否 | 否 | 实验性 | 稳定版 (+ rclpy) | 稳定版 (+ rclpy) |
| 默认包格式 | sqlite3 | sqlite3 | MCAP | MCAP | MCAP |
| ros2_control接口 | 不适用 (独立) | 2.x | 4.x | 4.x | 最新版 |
| CMake推荐 | ament
targetdeps | ament
targetdeps | 两者均可 | target
linklibs | target
linklibs |
当用户未指定时,默认使用最新的LTS版本(Jazzy)。在Dockerfile、CI和文档中锁定确切的发行版,以确保构建可重现。
2. C++ vs Python 决策
根据节点的角色选择语言,而非个人偏好。
使用 rclcpp (C++) 当:
- - 节点位于运行频率≥100 Hz的控制回路中
- 需要确定性内存分配(实时路径)
- 节点是硬件驱动或控制器插件
- 需要进程内零拷贝通信
使用 rclpy (Python) 当:
- - 节点用于编排、监控或参数管理
- 需要快速原型开发且频繁迭代
- 大量使用Python原生的ML框架(PyTorch、TensorFlow)
- 节点不在延迟关键路径上
混合栈是正常的。 一个典型的机器人有C++驱动/控制器和Python编排/监控。注意:component_container(组合)只能通过pluginlib加载C++组件。Python节点作为独立进程运行,但可以共享一个启动文件并通过零开销的主机内DDS进行通信。
进程内通信适用于共享同一进程的任何节点——不仅限于可组合组件。任何在同一进程中实例化并设置useintraprocess_comms(true)的节点都可以使用零拷贝传输。
3. 包结构约定
每个包都应遵循此布局。工作空间内的一致性可减少上手时间,并使CI脚本可移植。
my_package/
├── CMakeLists.txt # 或纯Python包的setup.py
├── package.xml # 格式3,使用标签
├── config/
│ └── params.yaml # 默认参数
├── launch/
│ └── bringup.launch.py # Python启动文件
├── include/my_package/ # C++公共头文件(如果是库)
├── src/ # C++源文件
├── mypackage/ # Python模块(如果是amentpython或混合包)
├── test/ # gtest、pytest、launch_testing
├── urdf/ # URDF/xacro(如果适用)
├── msg/ srv/ action/ # 自定义接口(优先使用专用的_interfaces包)
└── README.md
将接口定义分离到*_interfaces包中,以便下游包可以依赖接口而无需引入实现。
4. 参数规范
- - 在节点构造函数中声明每个参数,包括类型、描述、范围和默认值——绝不使用未声明的参数。
- 使用带有FloatingPointRange或IntegerRange的ParameterDescriptor来设置数值边界。参数服务器会在设置时拒绝超出范围的值。
- 将相关参数分组到命名空间前缀下:controller.kp、controller.ki、controller.kd。
- 从config/params.yaml加载默认值;允许在启动时覆盖。
- 对于动态重配置,注册一个setparameterscallback,并在接受新值之前原子性地验证它们。
5. 错误处理理念
- - 节点不得静默地吞掉错误。以适当的严重级别记录日志,然后采取安全操作(停止运动、请求帮助、转换到错误状态)。
- 优先使用生命周期节点错误转换,而非临时的布尔标志。
- 调用服务时,始终显式处理服务不可用和未来超时的情况。
- 对于硬件驱动,区分瞬态错误(带退避重试)和致命错误(转换到FINALIZED状态并通知操作员)。
6. 服务质量默认值
从这些配置开始,并根据用例进行调整:
| 用例 | 可靠性 | 持久性 | 历史记录 |