Project Overview

What is NavDP Research Sandbox?

A deep-dive into the framework's purpose, how it relates to the official NavDP paper, what tasks it supports, and what this fork adds on top.

Background & Motivation

NavDP (Navigation Diffusion Policy) is a breakthrough research paper by Wenzhe Cai et al. that demonstrates how a diffusion-based model trained entirely in simulation can achieve robust, real-world mapless navigation. The key insight is using privileged information during training (e.g., ground-truth depth, full scene geometry) that is unavailable at inference time, dramatically narrowing the sim-to-real gap.

This fork is an academic research sandbox that extends the official evaluation framework in three primary directions:

πŸ“„
Citing the Original Work This sandbox is built upon: "NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance" by Wenzhe Cai, Jiaqi Peng, Yuqiang Yang, Yujian Zhang, Meng Wei, Hanqing Wang, Yilun Chen, Tai Wang and Jiangmiao Pang. arXiv:2505.08712 Β· Official Repository

Navigation Tasks

The benchmark supports three classes of navigation challenges, each with separate evaluation scripts and goal-conditioning inputs:

Task Goal Conditioning Eval Script Teleop Script
NoGoal Exploration None (open-ended) eval_nogoal_wheeled.py teleop_nogoal_wheeled.py
PointGoal Navigation (x, y) world coordinate eval_pointgoal_wheeled.py teleop_pointgoal_wheeled.py
ImageGoal Navigation Reference RGB image eval_imagegoal_wheeled.py teleop_imagegoal_wheeled.py

Episode Format

Start/goal pairs for each task are stored as .npy files inside the scene directories. Both PointGoal and ImageGoal episodes are pre-generated per scene so that evaluations are reproducible across methods:

assets/scenes/cluttered_easy/easy_0/
β”œβ”€β”€ cluttered-0.usd/                     # Scene geometry
β”œβ”€β”€ imagegoal_start_goal_pairs.npy       # ImageGoal episodes
└── pointgoal_start_goal_pairs.npy       # PointGoal episodes

Scene Categories

The benchmark supports four scene categories from InternScene-N1, spanning a spectrum from synthetic clutter to photorealistic indoor environments:

Scene TypeDescriptionDifficultyEpisodes
Cluttered EasyRandom obstacles in open spaceLowassets/scenes/cluttered_easy/
Cluttered HardDense clutter, tight corridorsHighassets/scenes/cluttered_hard/
InternScenes HomeRealistic home layouts (15+ scenes)Mediumassets/scenes/internscenes_home/
InternScenes CommercialOffice / commercial spacesMedium–Highassets/scenes/internscenes_commercial/

Visualization System

One of the key contributions of this fork is the multi-perspective visualization pipeline implemented in utils_tasks/visualization_utils.py. The VisualizationManager class provides:

Bird's Eye View (BEV)

Multi-Trajectory Value Heatmap

Output Format

Evaluation runs write video frames via imageio to evaluation_outputs/videos/. Raw metrics (success rate, path length, episode duration) are logged as CSV to evaluation_outputs/benchmark_runs/.

Robot Comparison

PropertyDingo (Default)LeKiwi (Custom)
Drive TypeDifferential (2-wheel)Omni-directional (3-wheel)
Wheel Radius0.0591 m0.0325 m
Wheel Base0.22616 m0.18 m
Actuated Jointsleft_wheel, right_wheel3Γ— ST3215 Servo Motors + 6-DOF arm
USD Sourceassets/robots/dingo.usdassets/robots/lekiwi/lekiwi_final.usd
Config Fileconfigs/robots/dingo_config.pyconfigs/robots/lekiwi_config.py
ControllerDifferentialControllerDifferentialController (2-of-3 wheels)
Collision ModelBuilt-in USD collisionsProcedurally injected (Cyl + Sphere)

Repository Structure

NavDP/ β”œβ”€β”€ assets/ # Scene USDs, robot USDs, episode .npy files β”‚ β”œβ”€β”€ robots/ β”‚ β”‚ β”œβ”€β”€ dingo.usd β”‚ β”‚ └── lekiwi/ β”‚ β”‚ β”œβ”€β”€ lekiwi.usd # Original export (broken articulation) β”‚ β”‚ β”œβ”€β”€ lekiwi_floating.usd # Fixed articulation root β”‚ β”‚ └── lekiwi_final.usd # Final: articulation + collisions β”‚ └── scenes/ # Cluttered / Home / Commercial β”œβ”€β”€ baselines/ # Four navigation policy servers β”‚ β”œβ”€β”€ navdp/ # Primary diffusion policy β”‚ β”œβ”€β”€ logoplanner/ # Language-guided planner + real-world host β”‚ β”œβ”€β”€ nomad/ # Topological diffusion model β”‚ └── vint/ # Visual Navigation Transformer β”œβ”€β”€ configs/ # Robot & scene configuration classes β”‚ β”œβ”€β”€ robots/ β”‚ β”‚ β”œβ”€β”€ dingo_config.py β”‚ β”‚ └── lekiwi_config.py β”‚ β”œβ”€β”€ scenes/ β”‚ └── tasks/ β”œβ”€β”€ wheeled_robots/ # Robot controller implementations β”‚ └── controllers/ β”‚ β”œβ”€β”€ base_controller.py β”‚ └── differential_controller.py β”œβ”€β”€ utils_tasks/ # Shared evaluation utilities β”‚ β”œβ”€β”€ basic_utils.py # PlanningInput/Output, metrics, drawing β”‚ β”œβ”€β”€ client_utils.py # HTTP client helpers (navigator_reset, etc.) β”‚ β”œβ”€β”€ tracking_utils.py # MPC_Controller β”‚ └── visualization_utils.py # VisualizationManager (BEV, heatmap) β”œβ”€β”€ research_tools/ # USD diagnostics & fix scripts (this fork) β”œβ”€β”€ development_logs/ # Engineering journals β”œβ”€β”€ evaluation_outputs/ # Videos, CSVs from benchmark runs β”œβ”€β”€ eval_pointgoal_wheeled.py # Main evaluation entry points β”œβ”€β”€ eval_imagegoal_wheeled.py β”œβ”€β”€ eval_nogoal_wheeled.py β”œβ”€β”€ eval_startgoal_wheeled.py β”œβ”€β”€ teleop_pointgoal_wheeled.py # Manual teleoperation scripts └── requirements.txt