A deep-dive into the framework's purpose, how it relates to the official NavDP paper, what tasks it supports, and what this fork adds on top.
NavDP (Navigation Diffusion Policy) is a breakthrough research paper by Wenzhe Cai et al. that demonstrates how a diffusion-based model trained entirely in simulation can achieve robust, real-world mapless navigation. The key insight is using privileged information during training (e.g., ground-truth depth, full scene geometry) that is unavailable at inference time, dramatically narrowing the sim-to-real gap.
This fork is an academic research sandbox that extends the official evaluation framework in three primary directions:
The benchmark supports three classes of navigation challenges, each with separate evaluation scripts and goal-conditioning inputs:
| Task | Goal Conditioning | Eval Script | Teleop Script |
|---|---|---|---|
| NoGoal Exploration | None (open-ended) | eval_nogoal_wheeled.py |
teleop_nogoal_wheeled.py |
| PointGoal Navigation | (x, y) world coordinate | eval_pointgoal_wheeled.py |
teleop_pointgoal_wheeled.py |
| ImageGoal Navigation | Reference RGB image | eval_imagegoal_wheeled.py |
teleop_imagegoal_wheeled.py |
Start/goal pairs for each task are stored as .npy files inside the scene directories. Both PointGoal and ImageGoal
episodes are pre-generated per scene so that evaluations are reproducible across methods:
assets/scenes/cluttered_easy/easy_0/
βββ cluttered-0.usd/ # Scene geometry
βββ imagegoal_start_goal_pairs.npy # ImageGoal episodes
βββ pointgoal_start_goal_pairs.npy # PointGoal episodes
The benchmark supports four scene categories from InternScene-N1, spanning a spectrum from synthetic clutter to photorealistic indoor environments:
| Scene Type | Description | Difficulty | Episodes |
|---|---|---|---|
| Cluttered Easy | Random obstacles in open space | Low | assets/scenes/cluttered_easy/ |
| Cluttered Hard | Dense clutter, tight corridors | High | assets/scenes/cluttered_hard/ |
| InternScenes Home | Realistic home layouts (15+ scenes) | Medium | assets/scenes/internscenes_home/ |
| InternScenes Commercial | Office / commercial spaces | MediumβHigh | assets/scenes/internscenes_commercial/ |
One of the key contributions of this fork is the multi-perspective visualization pipeline implemented in
utils_tasks/visualization_utils.py. The VisualizationManager class provides:
history_size) in grayLINE_AA and Gaussian blur post-processing[-1.2, 0.2] for consistent comparison across episodes
Evaluation runs write video frames via imageio to evaluation_outputs/videos/.
Raw metrics (success rate, path length, episode duration) are logged as CSV to evaluation_outputs/benchmark_runs/.
| Property | Dingo (Default) | LeKiwi (Custom) |
|---|---|---|
| Drive Type | Differential (2-wheel) | Omni-directional (3-wheel) |
| Wheel Radius | 0.0591 m | 0.0325 m |
| Wheel Base | 0.22616 m | 0.18 m |
| Actuated Joints | left_wheel, right_wheel | 3Γ ST3215 Servo Motors + 6-DOF arm |
| USD Source | assets/robots/dingo.usd | assets/robots/lekiwi/lekiwi_final.usd |
| Config File | configs/robots/dingo_config.py | configs/robots/lekiwi_config.py |
| Controller | DifferentialController | DifferentialController (2-of-3 wheels) |
| Collision Model | Built-in USD collisions | Procedurally injected (Cyl + Sphere) |