How the evaluation environment, MPC controller, and navigation policy server interact โ covering threading, data flow, and the sensor pipeline.
The framework is built around three core design principles, inherited from the original NavDP benchmark and extended in this fork:
Navigation models run as independent HTTP servers. The simulator calls them via REST API โ swapping algorithms requires zero changes to the simulator code.
A dedicated planning thread posts observations to the policy server continuously. The MPC controller tracks the latest trajectory asynchronously, enabling real-time-like evaluation metrics.
Isaac Lab's vectorized environment supports N parallel simulation instances via --num_envs. Each environment has its own VisualizationManager and MPC instance.
The main evaluation scripts (eval_*_wheeled.py) orchestrate all components. Here is how data flows through the system:
+----------------------------------------------------------------+
| Isaac Sim Process |
| |
| +----------------------+ +------------------------------+ |
| | Isaac Lab Env | | Main Loop | |
| | (ManagerBasedRLEnv) | | - step simulation | |
| | |<---| - read sensors | |
| | Robot (Dingo/LeKiwi) | | - apply MPC actions | |
| | Scene USD | | - write visualization | |
| | Cameras (RGB+Depth) | | - log metrics | |
| +----------------------+ +------------------------------+ |
| | shared memory |
| +--------------v---------------+ |
| | planning_thread() | |
| | - reads PlanningInput | |
| | - POSTs to HTTP policy API | |
| | - writes PlanningOutput | |
| +------------------------------+ |
+----------------------------------------------------------------+
| HTTP POST (JSON)
+-----------v------------------+
| Navigation Policy Server |
| (separate process) |
| |
| NavDP / LoGoPlanner / |
| NoMAD / ViNT / ViPlanner |
| |
| Returns: trajectory array |
+------------------------------+
Each evaluation script uses two threads to decouple simulation speed from network latency:
env.step() loop at simulation rate (~20 Hz)PlanningInput under input_lockPlanningOutput under output_lockMPC_Controller to compute wheel velocity commandsenv.action_manager.apply_action()VisualizationManager.render() and writes frames to videothreading.ThreadPlanningInput for new observationshttp://localhost:{port}/planPlanningOutputis_planning = False when idle so the main thread can detect stalls# utils_tasks/basic_utils.py
class PlanningInput:
current_goal: np.ndarray | None
current_image: np.ndarray | None # RGB (H, W, 3) uint8
current_depth: np.ndarray | None # Depth (H, W) float32 in meters
camera_pos: np.ndarray | None # [x, y, z] world position
camera_rot: np.ndarray | None # Quaternion [w, x, y, z]
class PlanningOutput:
trajectory: np.ndarray | None # (N, 2) array of waypoints
all_trajectories: list | None # All candidate trajectories
trajectory_values: list | None # Scalar value per trajectory
is_planning: bool # True while HTTP call in flight
The MPC_Controller class in utils_tasks/tracking_utils.py is responsible for converting
trajectory waypoints (output by the policy) into wheel velocity commands. Key aspects:
requirements.txt)(left_wheel_velocity, right_wheel_velocity) in rad/s for the DifferentialController--stop_threshold (default: -3.0) controls when the robot considers an episode completedEach robot configuration defines three sensor configurations. During evaluation, the simulator reads:
| Sensor | Type | Resolution | Usage |
|---|---|---|---|
| Front Camera (RGB) | PinholeCamera | 640ร360 | Policy observation input; visualization left panel |
| Front Camera (Depth) | PinholeCamera | 640ร360 | PointCloud construction for BEV occupancy grid; policy depth input |
| Goal Camera (RGB) | PinholeCamera | 640ร360 | ImageGoal task: capture reference image at goal pose |
| Metric Camera | PinholeCamera | 160ร90 | Low-res overhead view for BEV tracking |
| Contact Sensor | ContactSensorCfg | N/A | Collision detection; episode termination on contact |
The depth image is back-projected using the camera's intrinsic matrix to form a 3D point cloud via open3d.
This point cloud is then transformed into the world frame and projected onto a 2D occupancy grid for the BEV visualization.
The occupancy grid has configurable resolution (default: 0.05 m/cell) and range.
All navigation servers must implement the same REST API so they can be swapped without changing the evaluation scripts:
POST /resetCalled at the start of each episode. Sends camera intrinsics so the server can initialize depth-dependent models.
Request body:
{
"intrinsic_matrix": [[fx, 0, cx], [0, fy, cy], [0, 0, 1]],
"image_width": 640,
"image_height": 360
}
Response: { "status": "ok" }
POST /plan (PointGoal / ImageGoal)Main inference endpoint. Called every planning cycle.
Request body:
{
"goal": [gx, gy, gz], // World-frame goal position
"image": "<base64-encoded-JPEG>",
"depth": "<base64-encoded-float32-bytes>",
"camera_pos": [x, y, z],
"camera_rot": [w, x, y, z]
}
Response:
{
"trajectory": [[x1,y1], [x2,y2], ...], // Waypoints in world frame
"all_trajectories": [...], // Optional: all candidate paths
"trajectory_values": [v1, v2, ...] // Optional: per-trajectory scores
}
Metrics are computed per episode and written to CSV by utils_tasks/basic_utils.py::write_metrics():
| Metric | Description |
|---|---|
| Success Rate (SR) | Fraction of episodes where robot reached within threshold of goal |
| Path Length (PL) | Total distance traveled by the robot base in meters |
| Episode Duration | Wall-clock time in seconds from start to termination |
| Collision Flag | 1 if episode terminated due to contact sensor trigger |
| SPL | Success weighted by Path Length (standard navigation metric) |