System Architecture — NavDP Research Sandbox

Design Philosophy

The framework is built around three core design principles, inherited from the original NavDP benchmark and extended in this fork:

⟨⟩

Decoupled Policy Server

Navigation models run as independent HTTP servers. The simulator calls them via REST API — swapping algorithms requires zero changes to the simulator code.

⟳

Async Planning Loop

A dedicated planning thread posts observations to the policy server continuously. The MPC controller tracks the latest trajectory asynchronously, enabling real-time-like evaluation metrics.

⊞

Multi-Env Parallelism

Isaac Lab's vectorized environment supports N parallel simulation instances via --num_envs. Each environment has its own VisualizationManager and MPC instance.

Component Map

The main evaluation scripts (eval_*_wheeled.py) orchestrate all components. Here is how data flows through the system:

+----------------------------------------------------------------+
|                       Isaac Sim Process                        |
|                                                                |
|  +----------------------+    +------------------------------+  |
|  |    Isaac Lab Env     |    |          Main Loop           |  |
|  | (ManagerBasedRLEnv)  |    |  - step simulation           |  |
|  |                      |<---|  - read sensors              |  |
|  | Robot (Dingo/LeKiwi) |    |  - apply MPC actions         |  |
|  | Scene USD            |    |  - write visualization       |  |
|  | Cameras (RGB+Depth)  |    |  - log metrics               |  |
|  +----------------------+    +------------------------------+  |
|                                             | shared memory    |
|                              +--------------v---------------+  |
|                              |   planning_thread()          |  |
|                              |  - reads PlanningInput       |  |
|                              |  - POSTs to HTTP policy API  |  |
|                              |  - writes PlanningOutput     |  |
|                              +------------------------------+  |
+----------------------------------------------------------------+
                                          | HTTP POST (JSON)
                              +-----------v------------------+
                              |   Navigation Policy Server   |
                              |      (separate process)      |
                              |                              |
                              |    NavDP / LoGoPlanner /     |
                              |   NoMAD / ViNT / ViPlanner   |
                              |                              |
                              |  Returns: trajectory array   |
                              +------------------------------+

Threading Model

Each evaluation script uses two threads to decouple simulation speed from network latency:

Main Thread (Simulation)

Drives the IsaacLab env.step() loop at simulation rate (~20 Hz)
Reads RGB image, depth map, camera intrinsics, robot pose from the environment
Writes observations to PlanningInput under input_lock
Reads the latest planned trajectory from PlanningOutput under output_lock
Passes the trajectory to MPC_Controller to compute wheel velocity commands
Applies wheel actions via env.action_manager.apply_action()
Calls VisualizationManager.render() and writes frames to video

Planning Thread (HTTP Client)

Runs in background via threading.Thread
Continuously polls PlanningInput for new observations
Serializes observations (RGB as base64 JPEG, depth as float32 bytes) into JSON
POSTs to the policy server: http://localhost:{port}/plan
Receives trajectory predictions and writes to PlanningOutput
Sets is_planning = False when idle so the main thread can detect stalls

Shared State Objects

# utils_tasks/basic_utils.py
class PlanningInput:
    current_goal: np.ndarray | None
    current_image: np.ndarray | None       # RGB (H, W, 3) uint8
    current_depth: np.ndarray | None       # Depth (H, W) float32 in meters
    camera_pos: np.ndarray | None          # [x, y, z] world position
    camera_rot: np.ndarray | None          # Quaternion [w, x, y, z]

class PlanningOutput:
    trajectory: np.ndarray | None          # (N, 2) array of waypoints
    all_trajectories: list | None          # All candidate trajectories
    trajectory_values: list | None         # Scalar value per trajectory
    is_planning: bool                      # True while HTTP call in flight

MPC Trajectory Controller

The MPC_Controller class in utils_tasks/tracking_utils.py is responsible for converting trajectory waypoints (output by the policy) into wheel velocity commands. Key aspects:

Uses a Model Predictive Control formulation that optimizes a receding-horizon cost over the planned waypoints
Handles coordinate frame transforms between the camera/world frame and the robot body frame
Uses CasADi for symbolic optimization and OSQP as the QP solver backend (both in requirements.txt)
Outputs (left_wheel_velocity, right_wheel_velocity) in rad/s for the DifferentialController
Stop threshold --stop_threshold (default: -3.0) controls when the robot considers an episode completed

Sensor Pipeline

Each robot configuration defines three sensor configurations. During evaluation, the simulator reads:

Sensor	Type	Resolution	Usage
Front Camera (RGB)	PinholeCamera	640×360	Policy observation input; visualization left panel
Front Camera (Depth)	PinholeCamera	640×360	PointCloud construction for BEV occupancy grid; policy depth input
Goal Camera (RGB)	PinholeCamera	640×360	ImageGoal task: capture reference image at goal pose
Metric Camera	PinholeCamera	160×90	Low-res overhead view for BEV tracking
Contact Sensor	ContactSensorCfg	N/A	Collision detection; episode termination on contact

PointCloud Construction

The depth image is back-projected using the camera's intrinsic matrix to form a 3D point cloud via open3d. This point cloud is then transformed into the world frame and projected onto a 2D occupancy grid for the BEV visualization. The occupancy grid has configurable resolution (default: 0.05 m/cell) and range.

HTTP API Contract

All navigation servers must implement the same REST API so they can be swapped without changing the evaluation scripts:

`POST /reset`

Called at the start of each episode. Sends camera intrinsics so the server can initialize depth-dependent models.

Request body:
{
  "intrinsic_matrix": [[fx, 0, cx], [0, fy, cy], [0, 0, 1]],
  "image_width": 640,
  "image_height": 360
}
Response: { "status": "ok" }

`POST /plan` (PointGoal / ImageGoal)

Main inference endpoint. Called every planning cycle.

Request body:
{
  "goal": [gx, gy, gz],               // World-frame goal position
  "image": "<base64-encoded-JPEG>",
  "depth": "<base64-encoded-float32-bytes>",
  "camera_pos": [x, y, z],
  "camera_rot": [w, x, y, z]
}
Response:
{
  "trajectory": [[x1,y1], [x2,y2], ...],    // Waypoints in world frame
  "all_trajectories": [...],                  // Optional: all candidate paths
  "trajectory_values": [v1, v2, ...]          // Optional: per-trajectory scores
}

Evaluation Metrics

Metrics are computed per episode and written to CSV by utils_tasks/basic_utils.py::write_metrics():

Metric	Description
Success Rate (SR)	Fraction of episodes where robot reached within threshold of goal
Path Length (PL)	Total distance traveled by the robot base in meters
Episode Duration	Wall-clock time in seconds from start to termination
Collision Flag	1 if episode terminated due to contact sensor trigger
SPL	Success weighted by Path Length (standard navigation metric)

Framework Architecture