正在加载视频...
视频加载失败
How can lightweight drones without depth cameras navigate using monocular images? Check out our paper at ISER 2023! MonoNav: MAV Navigation via Monocular Depth Estimation and Reconstruction arXiv: website: Work led by Nate Simon
27,642 次观看 • 2 年前 •via X (Twitter)
9 条评论

In this work, we ask the following question: using only a monocular camera, optical odometry, and offboard computation, can we create metrically accurate maps that enable the use of conventional path planning to achieve robust autonomy in unknown environments?

The answer is YES - surprisingly, a monocular system using state-of-the-art depth estimation techniques can perform local 3D reconstruction with sufficient quality to enable fast (0.5 m/s) MAV navigation in unexplored, constrained, indoor environments.

We present MonoNav: a monocular navigation stack that leverages pre-trained transformer-based models for monocular depth estimation (ZoeDepth) in combination with off-the-shelf fusion (Open3D) and conventional planning techniques (motion primitives).

MonoNav is able to reconstruct and navigate in constrained indoor environments. In another example, we see MonoNav navigating a hallway corner at 0.5 m/s.

We compare MonoNav to NoMaD, a state of the art method in monocular navigation. NoMaD uses a transformer encoder and diffusion policy to directly output action candidates from a series of RGB images (and optional goal image). Website:

We find NoMaD works well when a clear maneuver is required, e.g., turning to avoid a wall. However, the action candidates are not always diverse and occasionally suggest turning into the wall. In another case, the action candidates are insufficiently evasive to avoid a crash.

In 15 side-by-side experiments in diverse conditions, we find that MonoNav significantly reduces collision rate (by a factor of 4x). This increase in safety comes at the cost of conservatism, in terms of a 22% reduction in goal completion.

This performance occurs because MonoNav reasons explicitly about the environment scale, and can self-arrest and land if collision appears imminent. For more information, check out: Video: Paper: Website:

@Nate___Simon This is pretty amazing work! Thinking very simply, we can still perceive depth with just one eye, so one would think you don't need stereo vision cameras for depth perception ..
