正在加载视频...

视频加载失败

How can lightweight drones without depth cameras navigate using monocular images? Check out our paper at ISER 2023! MonoNav: MAV Navigation via Monocular Depth Estimation and Reconstruction arXiv: website: Work led by Nate Simon

27,642 次观看 • 2 年前 •via X (Twitter)

9 条评论

Anirudha Majumdar 的头像
Anirudha Majumdar2 年前

In this work, we ask the following question: using only a monocular camera, optical odometry, and offboard computation, can we create metrically accurate maps that enable the use of conventional path planning to achieve robust autonomy in unknown environments?

Anirudha Majumdar 的头像
Anirudha Majumdar2 年前

The answer is YES - surprisingly, a monocular system using state-of-the-art depth estimation techniques can perform local 3D reconstruction with sufficient quality to enable fast (0.5 m/s) MAV navigation in unexplored, constrained, indoor environments.

Anirudha Majumdar 的头像
Anirudha Majumdar2 年前

We present MonoNav: a monocular navigation stack that leverages pre-trained transformer-based models for monocular depth estimation (ZoeDepth) in combination with off-the-shelf fusion (Open3D) and conventional planning techniques (motion primitives).

Anirudha Majumdar 的头像
Anirudha Majumdar2 年前

MonoNav is able to reconstruct and navigate in constrained indoor environments. In another example, we see MonoNav navigating a hallway corner at 0.5 m/s.

Anirudha Majumdar 的头像
Anirudha Majumdar2 年前

We compare MonoNav to NoMaD, a state of the art method in monocular navigation. NoMaD uses a transformer encoder and diffusion policy to directly output action candidates from a series of RGB images (and optional goal image). Website:

Anirudha Majumdar 的头像
Anirudha Majumdar2 年前

We find NoMaD works well when a clear maneuver is required, e.g., turning to avoid a wall. However, the action candidates are not always diverse and occasionally suggest turning into the wall. In another case, the action candidates are insufficiently evasive to avoid a crash.

Anirudha Majumdar 的头像
Anirudha Majumdar2 年前

In 15 side-by-side experiments in diverse conditions, we find that MonoNav significantly reduces collision rate (by a factor of 4x). This increase in safety comes at the cost of conservatism, in terms of a 22% reduction in goal completion.

Anirudha Majumdar 的头像
Anirudha Majumdar2 年前

This performance occurs because MonoNav reasons explicitly about the environment scale, and can self-arrest and land if collision appears imminent. For more information, check out: Video: Paper: Website:

Avik Sarkar 的头像
Avik Sarkar2 年前

@Nate___Simon This is pretty amazing work! Thinking very simply, we can still perceive depth with just one eye, so one would think you don't need stereo vision cameras for depth perception ..

相关视频