Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

How can lightweight drones without depth cameras navigate using monocular images? Check out our paper at ISER 2023! MonoNav: MAV Navigation via Monocular Depth Estimation and Reconstruction arXiv: website: Work led by Nate Simon

Anirudha Majumdar

5,475 subscribers

27,642 görüntüleme • 2 yıl önce •via X (Twitter)

Bilim & Teknoloji

Anya Rossi• Live Now

Private livecam show

9 Yorum

Anirudha Majumdar profil fotoğrafı

Anirudha Majumdar2 yıl önce

In this work, we ask the following question: using only a monocular camera, optical odometry, and offboard computation, can we create metrically accurate maps that enable the use of conventional path planning to achieve robust autonomy in unknown environments?

Anirudha Majumdar profil fotoğrafı

Anirudha Majumdar2 yıl önce

The answer is YES - surprisingly, a monocular system using state-of-the-art depth estimation techniques can perform local 3D reconstruction with sufficient quality to enable fast (0.5 m/s) MAV navigation in unexplored, constrained, indoor environments.

Anirudha Majumdar profil fotoğrafı

Anirudha Majumdar2 yıl önce

We present MonoNav: a monocular navigation stack that leverages pre-trained transformer-based models for monocular depth estimation (ZoeDepth) in combination with off-the-shelf fusion (Open3D) and conventional planning techniques (motion primitives).

Anirudha Majumdar profil fotoğrafı

Anirudha Majumdar2 yıl önce

MonoNav is able to reconstruct and navigate in constrained indoor environments. In another example, we see MonoNav navigating a hallway corner at 0.5 m/s.

Anirudha Majumdar profil fotoğrafı

Anirudha Majumdar2 yıl önce

We compare MonoNav to NoMaD, a state of the art method in monocular navigation. NoMaD uses a transformer encoder and diffusion policy to directly output action candidates from a series of RGB images (and optional goal image). Website:

Anirudha Majumdar profil fotoğrafı

Anirudha Majumdar2 yıl önce

We find NoMaD works well when a clear maneuver is required, e.g., turning to avoid a wall. However, the action candidates are not always diverse and occasionally suggest turning into the wall. In another case, the action candidates are insufficiently evasive to avoid a crash.

Anirudha Majumdar profil fotoğrafı

Anirudha Majumdar2 yıl önce

In 15 side-by-side experiments in diverse conditions, we find that MonoNav significantly reduces collision rate (by a factor of 4x). This increase in safety comes at the cost of conservatism, in terms of a 22% reduction in goal completion.

Anirudha Majumdar profil fotoğrafı

Anirudha Majumdar2 yıl önce

This performance occurs because MonoNav reasons explicitly about the environment scale, and can self-arrest and land if collision appears imminent. For more information, check out: Video: Paper: Website:

Avik Sarkar profil fotoğrafı

Avik Sarkar2 yıl önce

@Nate___Simon This is pretty amazing work! Thinking very simply, we can still perceive depth with just one eye, so one would think you don't need stereo vision cameras for depth perception ..

Benzer Videolar

Monocular depth estimation is “impossible” because one image can’t measure depth geometrically. Our iDisc #CVPR2023 can group pixels w/o supervision and learn depth inductive bias on groups. We get LiDAR-like (but denser) depth from single images! More:

Monocular depth estimation is “impossible” because one image can’t measure depth geometrically. Our iDisc #CVPR2023 can group pixels w/o supervision and learn depth inductive bias on groups. We get LiDAR-like (but denser) depth from single images! More:

Fisher Yu

52,234 görüntüleme • 3 yıl önce

TikTok presents Depth Anything Unleashing the Power of Large-Scale Unlabeled Data paper page: demo: Depth Anything is trained on 1.5M labeled images and 62M+ unlabeled images jointly, providing the most capable Monocular Depth Estimation (MDE) foundation models with the following features: zero-shot relative depth estimation, better than MiDaS v3.1 (BEiTL-512) zero-shot metric depth estimation, better than ZoeDepth optimal in-domain fine-tuning and evaluation on NYUv2 and KITTI

TikTok presents Depth Anything Unleashing the Power of Large-Scale Unlabeled Data paper page: demo: Depth Anything is trained on 1.5M labeled images and 62M+ unlabeled images jointly, providing the most capable Monocular Depth Estimation (MDE) foundation models with the following features: zero-shot relative depth estimation, better than MiDaS v3.1 (BEiTL-512) zero-shot metric depth estimation, better than ZoeDepth optimal in-domain fine-tuning and evaluation on NYUv2 and KITTI

AK

600,018 görüntüleme • 2 yıl önce

After a year of team work, we're thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3 reveals two key insights: 💎 A plain transformer (e.g., vanilla DINO) is enough. No specialized architecture. ✨ A single depth-ray representation is enough. No complex 3D tasks. Three series of models have been released: the main DA3 series, a monocular metric estimation series, and a monocular depth estimation series. The core team members, aside from me: Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y. Chen. 👇(1/n) #DepthAnything3

After a year of team work, we're thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3 reveals two key insights: 💎 A plain transformer (e.g., vanilla DINO) is enough. No specialized architecture. ✨ A single depth-ray representation is enough. No complex 3D tasks. Three series of models have been released: the main DA3 series, a monocular metric estimation series, and a monocular depth estimation series. The core team members, aside from me: Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y. Chen. 👇(1/n) #DepthAnything3

Bingyi Kang

514,165 görüntüleme • 6 ay önce

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation Contributions: • We propose a novel approach to inject SfM priors into diffusion-based depth estimation, enabling highly accurate and multi-view consistent depth predictions for each viewpoint. • Based on the proposed depth estimator, we design a new multi-view 3D geometry reconstruction framework and process some synthetic datasets to facilitate training. • We evaluate our method on diverse real-world scene data, including objects, indoor environments, streetscapes, and aerial scenes, demonstrating the superior performance and generalization capability of our approach.

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation Contributions: • We propose a novel approach to inject SfM priors into diffusion-based depth estimation, enabling highly accurate and multi-view consistent depth predictions for each viewpoint. • Based on the proposed depth estimator, we design a new multi-view 3D geometry reconstruction framework and process some synthetic datasets to facilitate training. • We evaluate our method on diverse real-world scene data, including objects, indoor environments, streetscapes, and aerial scenes, demonstrating the superior performance and generalization capability of our approach.

MrNeRF

25,651 görüntüleme • 1 yıl önce

I'm excited to share our new work Align3R that estimates camera poses and consistent depth maps from a monocular video of a dynamic scene. Project page: Code: Paper:

I'm excited to share our new work Align3R that estimates camera poses and consistent depth maps from a monocular video of a dynamic scene. Project page: Code: Paper:

Yuan Liu

56,547 görüntüleme • 1 yıl önce

"DVD: Dynamic Video Depth" TL;DR: Recovers temporally consistent depth from monocular videos using diffusion priors + geometric constraints, handling dynamic scenes and motion robustly.

"DVD: Dynamic Video Depth" TL;DR: Recovers temporally consistent depth from monocular videos using diffusion priors + geometric constraints, handling dynamic scenes and motion robustly.

Alexandre Morgand

11,804 görüntüleme • 2 ay önce

Live monocular 3D reconstruction from camera feed, with multi-frame stitching and relative positioning (frame buffer of 4)... running on M3 mac cpu at 24fps... visualised using Rerun next up... realtime semantic navigation in 3d space...

Live monocular 3D reconstruction from camera feed, with multi-frame stitching and relative positioning (frame buffer of 4)... running on M3 mac cpu at 24fps... visualised using Rerun next up... realtime semantic navigation in 3d space...

ud

22,080 görüntüleme • 5 ay önce

Introducing MegaSaM! 🎥 Accurate, fast, & robust structure + camera estimation from casual monocular videos of dynamic scenes! MegaSaM outputs camera parameters and consistent video depth, scaling to long videos with unconstrained camera paths and complex scene dynamics!

Introducing MegaSaM! 🎥 Accurate, fast, & robust structure + camera estimation from casual monocular videos of dynamic scenes! MegaSaM outputs camera parameters and consistent video depth, scaling to long videos with unconstrained camera paths and complex scene dynamics!

Zhengqi Li

56,923 görüntüleme • 1 yıl önce

DVD Deterministic Video Depth Estimation with Generative Priors paper:

DVD Deterministic Video Depth Estimation with Generative Priors paper:

AK

62,042 görüntüleme • 2 ay önce

Check out our CVPR 2023 Award Candidate paper, DynIBaR! DynIBaR takes monocular videos of dynamic scenes and renders novel views in space and time. It addresses limitations of prior dynamic NeRF methods, rendering much higher quality views.

Check out our CVPR 2023 Award Candidate paper, DynIBaR! DynIBaR takes monocular videos of dynamic scenes and renders novel views in space and time. It addresses limitations of prior dynamic NeRF methods, rendering much higher quality views.

Zhengqi Li

86,193 görüntüleme • 3 yıl önce

Want to use Depth Anything, but need metric depth rather than relative depth? Thrilled to introduce Prompt Depth Anything, a new paradigm for accurate metric depth estimation with up to 4K resolution. 👉Key Message: Depth foundation models like DA have already internalized rich geometric knowledge of the 3D world but lack a proper way to elicit it. Inspired by the success of prompting in LLMs, we propose prompting Depth Anything with metric cues to produce metric depth. This method proves to be very effective when using a low-cost lidar (e.g., iPhone's LiDAR), which is widely available, as prompts. We believe the prompt can generalize to other forms as long as scale information is provided. Prompt Depth Anything offers 1⃣A series of models for iPhone lidars. 2⃣4D reconstruction from monocular videos (captured with iPhone). 3⃣Improved generalization ability for robot manipulation, e.g. Training on cans but generalizing on glasses. 4⃣More detailed depth annotations for the ScanNet++ dataset. The first author is our excellent intern Haotong Lin. Paper: Huggingface: Project Page: Code:

Want to use Depth Anything, but need metric depth rather than relative depth? Thrilled to introduce Prompt Depth Anything, a new paradigm for accurate metric depth estimation with up to 4K resolution. 👉Key Message: Depth foundation models like DA have already internalized rich geometric knowledge of the 3D world but lack a proper way to elicit it. Inspired by the success of prompting in LLMs, we propose prompting Depth Anything with metric cues to produce metric depth. This method proves to be very effective when using a low-cost lidar (e.g., iPhone's LiDAR), which is widely available, as prompts. We believe the prompt can generalize to other forms as long as scale information is provided. Prompt Depth Anything offers 1⃣A series of models for iPhone lidars. 2⃣4D reconstruction from monocular videos (captured with iPhone). 3⃣Improved generalization ability for robot manipulation, e.g. Training on cans but generalizing on glasses. 4⃣More detailed depth annotations for the ScanNet++ dataset. The first author is our excellent intern Haotong Lin. Paper: Huggingface: Project Page: Code:

Bingyi Kang

67,550 görüntüleme • 1 yıl önce

InfiniDepth: Fine-grained depth estimation at any resolution; models depth as a neural implicit field instead of a grid; - tops DepthAnything; - recovers sharp details w/ a lightweight 15M decoder.

InfiniDepth: Fine-grained depth estimation at any resolution; models depth as a neural implicit field instead of a grid; - tops DepthAnything; - recovers sharp details w/ a lightweight 15M decoder.

Wildminder

59,112 görüntüleme • 5 ay önce

Introducing Marigold 🌼 - a universal monocular depth estimator, delivering incredibly sharp predictions in the wild! Based on Stable Diffusion, it is trained with synthetic depth data only and excels in zero-shot adaptation to real-world imagery. Check it out: 🌐 Website: 🤗 Hugging Face Space: 📄 Paper: 👾 Code: The team: Bingxin Ke (KeBingxin), yours truly (Anton Obukhov), Shengyu Huang (Shengyu Huang), Nando Metzger (Nando Metzger), Rodrigo Caye Daudt (Rodrigo Daudt), and Konrad Schindler. #ComputerVision #PRS #ETHZurich

Introducing Marigold 🌼 - a universal monocular depth estimator, delivering incredibly sharp predictions in the wild! Based on Stable Diffusion, it is trained with synthetic depth data only and excels in zero-shot adaptation to real-world imagery. Check it out: 🌐 Website: 🤗 Hugging Face Space: 📄 Paper: 👾 Code: The team: Bingxin Ke (KeBingxin), yours truly (Anton Obukhov), Shengyu Huang (Shengyu Huang), Nando Metzger (Nando Metzger), Rodrigo Caye Daudt (Rodrigo Daudt), and Konrad Schindler. #ComputerVision #PRS #ETHZurich

Anton Obukhov

489,743 görüntüleme • 2 yıl önce

Most multi-view reconstruction models need full supervision. We show they can self-improve without any ground truth labels. Introducing SelfEvo: Self-Improving 4D Perception via Self-Distillation. Up to +36.5% in video depth, +20.1% in camera estimation, zero annotation.

Most multi-view reconstruction models need full supervision. We show they can self-improve without any ground truth labels. Introducing SelfEvo: Self-Improving 4D Perception via Self-Distillation. Up to +36.5% in video depth, +20.1% in camera estimation, zero annotation.

Qianqian Wang

24,156 görüntüleme • 1 ay önce

Capturing out-of-focus photos is sooo frustrating! 😩 Check out our work on defocus control with dual-camera 📸 #CVPR2023 Our method can refocus and adjust the depth of field AFTER you capture the photograph! 🤩 Paper: Project:

Capturing out-of-focus photos is sooo frustrating! 😩 Check out our work on defocus control with dual-camera 📸 #CVPR2023 Our method can refocus and adjust the depth of field AFTER you capture the photograph! 🤩 Paper: Project:

Jia-Bin Huang

41,330 görüntüleme • 3 yıl önce

Good. ComfyUI just got native MoGe support - 3D geometry from monocular images.

Good. ComfyUI just got native MoGe support - 3D geometry from monocular images.

Wildminder

16,196 görüntüleme • 25 gün önce

AnyDepth Depth Estimation Made Easy

AnyDepth Depth Estimation Made Easy

AK

14,765 görüntüleme • 4 ay önce

Check out our #ICRA2025 paper ! Robust, precise, and terrain-aware—our RL controller significantly improves over baselines for whole-body 6-DoF tracking.💥 Project website: Work led by: Tifanny Portela, Andrei Cramariuc, Mayank Mittal and Marco Hutter

Check out our #ICRA2025 paper ! Robust, precise, and terrain-aware—our RL controller significantly improves over baselines for whole-body 6-DoF tracking.💥 Project website: Work led by: Tifanny Portela, Andrei Cramariuc, Mayank Mittal and Marco Hutter

Robotic Systems Lab

16,139 görüntüleme • 1 yıl önce

[SIGGRAPH '25] Monocular Online Reconstruction with Enhanced Detail Preservation Abstract (excerpt): Our approach addresses two key challenges in monocular online reconstruction: 1. Distributing Gaussians without relying on depth maps. 2. Ensuring both local and global consistency in the reconstructed maps. To achieve this, we introduce two key modules: - Hierarchical Gaussian Management Module: For effective Gaussian distribution. - Global Consistency Optimization Module: For maintaining alignment and coherence at all scales. In addition, we present the Multi-level Occupancy Hash Voxels (MOHV), a structure that regularizes Gaussians to capture details across multiple levels of granularity. MOHV ensures accurate reconstruction of both fine and coarse geometries and textures, preserving intricate details while maintaining overall structural integrity. Compared to state-of-the-art RGB-only and even RGB-D methods, our framework achieves superior reconstruction quality with high computational efficiency.

[SIGGRAPH '25] Monocular Online Reconstruction with Enhanced Detail Preservation Abstract (excerpt): Our approach addresses two key challenges in monocular online reconstruction: 1. Distributing Gaussians without relying on depth maps. 2. Ensuring both local and global consistency in the reconstructed maps. To achieve this, we introduce two key modules: - Hierarchical Gaussian Management Module: For effective Gaussian distribution. - Global Consistency Optimization Module: For maintaining alignment and coherence at all scales. In addition, we present the Multi-level Occupancy Hash Voxels (MOHV), a structure that regularizes Gaussians to capture details across multiple levels of granularity. MOHV ensures accurate reconstruction of both fine and coarse geometries and textures, preserving intricate details while maintaining overall structural integrity. Compared to state-of-the-art RGB-only and even RGB-D methods, our framework achieves superior reconstruction quality with high computational efficiency.

MrNeRF

23,616 görüntüleme • 1 yıl önce