Project Overview — NavDP Research Sandbox

Background & Motivation

NavDP (Navigation Diffusion Policy) is a breakthrough research paper by Wenzhe Cai et al. that demonstrates how a diffusion-based model trained entirely in simulation can achieve robust, real-world mapless navigation. The key insight is using privileged information during training (e.g., ground-truth depth, full scene geometry) that is unavailable at inference time, dramatically narrowing the sim-to-real gap.

This fork is an academic research sandbox that extends the official evaluation framework in three primary directions:

Hardware Expansion: Integrating the LeKiwi 3-wheeled omni-directional robot beyond the default Clearpath Dingo differential-drive base
Visualization Research: Adding Bird's Eye View and Third-Person perspective cameras for richer qualitative evaluation
Tooling: Building reusable diagnostic scripts that automate the robot on-boarding pipeline (USD fixing → collision injection → physics validation)

📄

Citing the Original Work This sandbox is built upon: "NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance" by Wenzhe Cai, Jiaqi Peng, Yuqiang Yang, Yujian Zhang, Meng Wei, Hanqing Wang, Yilun Chen, Tai Wang and Jiangmiao Pang. arXiv:2505.08712 · Official Repository

Navigation Tasks

The benchmark supports three classes of navigation challenges, each with separate evaluation scripts and goal-conditioning inputs:

Task	Goal Conditioning	Eval Script	Teleop Script
NoGoal Exploration	None (open-ended)	`eval_nogoal_wheeled.py`	`teleop_nogoal_wheeled.py`
PointGoal Navigation	(x, y) world coordinate	`eval_pointgoal_wheeled.py`	`teleop_pointgoal_wheeled.py`
ImageGoal Navigation	Reference RGB image	`eval_imagegoal_wheeled.py`	`teleop_imagegoal_wheeled.py`

Episode Format

Start/goal pairs for each task are stored as .npy files inside the scene directories. Both PointGoal and ImageGoal episodes are pre-generated per scene so that evaluations are reproducible across methods:

assets/scenes/cluttered_easy/easy_0/
├── cluttered-0.usd/                     # Scene geometry
├── imagegoal_start_goal_pairs.npy       # ImageGoal episodes
└── pointgoal_start_goal_pairs.npy       # PointGoal episodes

Scene Categories

The benchmark supports four scene categories from InternScene-N1, spanning a spectrum from synthetic clutter to photorealistic indoor environments:

Scene Type	Description	Difficulty	Episodes
Cluttered Easy	Random obstacles in open space	Low	assets/scenes/cluttered_easy/
Cluttered Hard	Dense clutter, tight corridors	High	assets/scenes/cluttered_hard/
InternScenes Home	Realistic home layouts (15+ scenes)	Medium	assets/scenes/internscenes_home/
InternScenes Commercial	Office / commercial spaces	Medium–High	assets/scenes/internscenes_commercial/

Visualization System

One of the key contributions of this fork is the multi-perspective visualization pipeline implemented in utils_tasks/visualization_utils.py. The VisualizationManager class provides:

Bird's Eye View (BEV)

Renders a top-down occupancy grid centered on the robot
Tracks historical positions (configurable history_size) in gray
Current obstacle points in red
Planned trajectory overlaid with a robot rectangle (white outline, oriented by yaw)
Anti-aliased via OpenCV LINE_AA and Gaussian blur post-processing

Multi-Trajectory Value Heatmap

Renders all candidate trajectories proposed by the diffusion policy
Colors each trajectory by its value score using a blue→green→red gradient
Fixed normalization range [-1.2, 0.2] for consistent comparison across episodes
Composited below the RGB+BEV strip in the final output video frame

Output Format

Evaluation runs write video frames via imageio to evaluation_outputs/videos/. Raw metrics (success rate, path length, episode duration) are logged as CSV to evaluation_outputs/benchmark_runs/.

Robot Comparison

Property	Dingo (Default)	LeKiwi (Custom)
Drive Type	Differential (2-wheel)	Omni-directional (3-wheel)
Wheel Radius	0.0591 m	0.0325 m
Wheel Base	0.22616 m	0.18 m
Actuated Joints	left_wheel, right_wheel	3× ST3215 Servo Motors + 6-DOF arm
USD Source	assets/robots/dingo.usd	assets/robots/lekiwi/lekiwi_final.usd
Config File	configs/robots/dingo_config.py	configs/robots/lekiwi_config.py
Controller	DifferentialController	DifferentialController (2-of-3 wheels)
Collision Model	Built-in USD collisions	Procedurally injected (Cyl + Sphere)

Repository Structure

NavDP/ ├── assets/ # Scene USDs, robot USDs, episode .npy files │ ├── robots/ │ │ ├── dingo.usd │ │ └── lekiwi/ │ │ ├── lekiwi.usd # Original export (broken articulation) │ │ ├── lekiwi_floating.usd # Fixed articulation root │ │ └── lekiwi_final.usd # Final: articulation + collisions │ └── scenes/ # Cluttered / Home / Commercial ├── baselines/ # Four navigation policy servers │ ├── navdp/ # Primary diffusion policy │ ├── logoplanner/ # Language-guided planner + real-world host │ ├── nomad/ # Topological diffusion model │ └── vint/ # Visual Navigation Transformer ├── configs/ # Robot & scene configuration classes │ ├── robots/ │ │ ├── dingo_config.py │ │ └── lekiwi_config.py │ ├── scenes/ │ └── tasks/ ├── wheeled_robots/ # Robot controller implementations │ └── controllers/ │ ├── base_controller.py │ └── differential_controller.py ├── utils_tasks/ # Shared evaluation utilities │ ├── basic_utils.py # PlanningInput/Output, metrics, drawing │ ├── client_utils.py # HTTP client helpers (navigator_reset, etc.) │ ├── tracking_utils.py # MPC_Controller │ └── visualization_utils.py # VisualizationManager (BEV, heatmap) ├── research_tools/ # USD diagnostics & fix scripts (this fork) ├── development_logs/ # Engineering journals ├── evaluation_outputs/ # Videos, CSVs from benchmark runs ├── eval_pointgoal_wheeled.py # Main evaluation entry points ├── eval_imagegoal_wheeled.py ├── eval_nogoal_wheeled.py ├── eval_startgoal_wheeled.py ├── teleop_pointgoal_wheeled.py # Manual teleoperation scripts └── requirements.txt