Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Depth Anything V2 This work presents Depth Anything V2. Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. Notably, compared with V1, this version produces much finer and more

AK

506,764 subscribers

114,537 görüntüleme • 2 yıl önce •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 Yorum

AK profil fotoğrafı

AK2 yıl önce

robust depth predictions through three key practices: 1) replacing all labeled real images with synthetic images, 2) scaling up the capacity of our teacher model, and 3) teaching student models via the bridge of large-scale pseudo-labeled real images. Compared with the

AK profil fotoğrafı

AK2 yıl önce

latest models built on Stable Diffusion, our models are significantly more efficient (more than 10x faster) and more accurate. We offer models of different scales (ranging from 25M to 1.3B params) to support extensive scenarios. Benefiting from their strong generalization

AK profil fotoğrafı

AK2 yıl önce

capability, we fine-tune them with metric depth labels to obtain our metric depth models. In addition to our models, considering the limited diversity and frequent noise in current test

AK profil fotoğrafı

AK2 yıl önce

sets, we construct a versatile evaluation benchmark with precise annotations and diverse scenes to facilitate future research.

AK profil fotoğrafı

AK2 yıl önce

paper page:

AK profil fotoğrafı

AK2 yıl önce

daily papers:

Tremeschin 🔱 profil fotoğrafı

Tremeschin 🔱2 yıl önce

Still waiting for the transformers pipeline instead of a git clone huggingface repo install in my project 😅 Results are awesome for DAv2 !!

Amir Laylaz profil fotoğrafı

Amir Laylaz2 yıl önce

@oleg__chomp 👀👀

JohnYue122333 profil fotoğrafı

JohnYue1223332 yıl önce

great

Sivan R. Hokayma profil fotoğrafı

Sivan R. Hokayma2 yıl önce

Holy shit this is awesome. Are the rgb values directly proportional to the distance to the camera or are they relative to other elements within the scene? i.e. will anything 1 ft away from the camera always be the same shade of red across different scenes?

Benzer Videolar

Took it further with Omma AI, (Spline ) now it works on video, and everything working in a Web Browser. Drop a video, Depth Anything v2 calculates the depth map frame by frame, and Three.js renders a fully lit 3D mesh in real time. Depth estimation on video in the browser +Automatic baking pass + Dynamic lighting reacting to the geometry #vibecoding #threejs #webgl

Took it further with Omma AI, (Spline ) now it works on video, and everything working in a Web Browser. Drop a video, Depth Anything v2 calculates the depth map frame by frame, and Three.js renders a fully lit 3D mesh in real time. Depth estimation on video in the browser +Automatic baking pass + Dynamic lighting reacting to the geometry #vibecoding #threejs #webgl

Joseph Azar

147,048 görüntüleme • 1 ay önce

How can lightweight drones without depth cameras navigate using monocular images? Check out our paper at ISER 2023! MonoNav: MAV Navigation via Monocular Depth Estimation and Reconstruction arXiv: website: Work led by Nate Simon

How can lightweight drones without depth cameras navigate using monocular images? Check out our paper at ISER 2023! MonoNav: MAV Navigation via Monocular Depth Estimation and Reconstruction arXiv: website: Work led by Nate Simon

Anirudha Majumdar

27,642 görüntüleme • 2 yıl önce

Want to use Depth Anything, but need metric depth rather than relative depth? Thrilled to introduce Prompt Depth Anything, a new paradigm for accurate metric depth estimation with up to 4K resolution. 👉Key Message: Depth foundation models like DA have already internalized rich geometric knowledge of the 3D world but lack a proper way to elicit it. Inspired by the success of prompting in LLMs, we propose prompting Depth Anything with metric cues to produce metric depth. This method proves to be very effective when using a low-cost lidar (e.g., iPhone's LiDAR), which is widely available, as prompts. We believe the prompt can generalize to other forms as long as scale information is provided. Prompt Depth Anything offers 1⃣A series of models for iPhone lidars. 2⃣4D reconstruction from monocular videos (captured with iPhone). 3⃣Improved generalization ability for robot manipulation, e.g. Training on cans but generalizing on glasses. 4⃣More detailed depth annotations for the ScanNet++ dataset. The first author is our excellent intern Haotong Lin. Paper: Huggingface: Project Page: Code:

Want to use Depth Anything, but need metric depth rather than relative depth? Thrilled to introduce Prompt Depth Anything, a new paradigm for accurate metric depth estimation with up to 4K resolution. 👉Key Message: Depth foundation models like DA have already internalized rich geometric knowledge of the 3D world but lack a proper way to elicit it. Inspired by the success of prompting in LLMs, we propose prompting Depth Anything with metric cues to produce metric depth. This method proves to be very effective when using a low-cost lidar (e.g., iPhone's LiDAR), which is widely available, as prompts. We believe the prompt can generalize to other forms as long as scale information is provided. Prompt Depth Anything offers 1⃣A series of models for iPhone lidars. 2⃣4D reconstruction from monocular videos (captured with iPhone). 3⃣Improved generalization ability for robot manipulation, e.g. Training on cans but generalizing on glasses. 4⃣More detailed depth annotations for the ScanNet++ dataset. The first author is our excellent intern Haotong Lin. Paper: Huggingface: Project Page: Code:

Bingyi Kang

67,550 görüntüleme • 1 yıl önce

I prompted Omma to build a #threejs app with real dynamic lighting on a 3D mesh - generated from a single image, using Depth Anything v2 + Transformers from Hugging Face ⚡ Depth estimation in the browser 🫧 3D mesh from the depth map 💡 Dynamic lighting reacting to the geometry No manual code. Pure AI prompting. The future of 3D on the web feels wide open.

I prompted Omma to build a #threejs app with real dynamic lighting on a 3D mesh - generated from a single image, using Depth Anything v2 + Transformers from Hugging Face ⚡ Depth estimation in the browser 🫧 3D mesh from the depth map 💡 Dynamic lighting reacting to the geometry No manual code. Pure AI prompting. The future of 3D on the web feels wide open.

Joseph Azar

14,041 görüntüleme • 1 ay önce

Monocular depth estimation is “impossible” because one image can’t measure depth geometrically. Our iDisc #CVPR2023 can group pixels w/o supervision and learn depth inductive bias on groups. We get LiDAR-like (but denser) depth from single images! More:

Monocular depth estimation is “impossible” because one image can’t measure depth geometrically. Our iDisc #CVPR2023 can group pixels w/o supervision and learn depth inductive bias on groups. We get LiDAR-like (but denser) depth from single images! More:

Fisher Yu

52,234 görüntüleme • 3 yıl önce

After a year of team work, we're thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3 reveals two key insights: 💎 A plain transformer (e.g., vanilla DINO) is enough. No specialized architecture. ✨ A single depth-ray representation is enough. No complex 3D tasks. Three series of models have been released: the main DA3 series, a monocular metric estimation series, and a monocular depth estimation series. The core team members, aside from me: Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y. Chen. 👇(1/n) #DepthAnything3

After a year of team work, we're thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3 reveals two key insights: 💎 A plain transformer (e.g., vanilla DINO) is enough. No specialized architecture. ✨ A single depth-ray representation is enough. No complex 3D tasks. Three series of models have been released: the main DA3 series, a monocular metric estimation series, and a monocular depth estimation series. The core team members, aside from me: Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y. Chen. 👇(1/n) #DepthAnything3

Bingyi Kang

514,467 görüntüleme • 7 ay önce

Bob Myers, on the difficulty of building a team with 3 max contracts: "The truth is, depth may be more important than it's ever been...We gotta be honest of can this model work. Depth is key, and you only have a certain amount of resources to spend."

Bob Myers, on the difficulty of building a team with 3 max contracts: "The truth is, depth may be more important than it's ever been...We gotta be honest of can this model work. Depth is key, and you only have a certain amount of resources to spend."

PHLY Sixers

77,269 görüntüleme • 1 ay önce

Excited to share our work InfiniDepth (CVPR 2026) — casting monocular depth estimation as a neural implicit field, which enables: 🔍 Arbitrary-Resolution 📐 Accurate Metric Depth 📷 Large-View Novel View Synthesis Feel free to try our code:

Excited to share our work InfiniDepth (CVPR 2026) — casting monocular depth estimation as a neural implicit field, which enables: 🔍 Arbitrary-Resolution 📐 Accurate Metric Depth 📷 Large-View Novel View Synthesis Feel free to try our code:

Sida Peng

38,707 görüntüleme • 3 ay önce

Thrilled to introduce Video Depth Anything to support Depth Estimation for super-long videos (over 5 minutes). 👉It enjoys all the benefits of #DepthAnything: high-quality, fast, robust, etc. Proj Page:

Thrilled to introduce Video Depth Anything to support Depth Estimation for super-long videos (over 5 minutes). 👉It enjoys all the benefits of #DepthAnything: high-quality, fast, robust, etc. Proj Page:

Bingyi Kang

25,464 görüntüleme • 1 yıl önce

Here’s a sneak peek using Rerun and Gradio for data annotation. It uses Video Depth Anything and Segment Anything 2 under the hood to generate segmentation masks and depth maps/point clouds. More to share next week.

Here’s a sneak peek using Rerun and Gradio for data annotation. It uses Video Depth Anything and Segment Anything 2 under the hood to generate segmentation masks and depth maps/point clouds. More to share next week.

Pablo Vela

36,719 görüntüleme • 1 yıl önce

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation Contributions: • We propose a novel approach to inject SfM priors into diffusion-based depth estimation, enabling highly accurate and multi-view consistent depth predictions for each viewpoint. • Based on the proposed depth estimator, we design a new multi-view 3D geometry reconstruction framework and process some synthetic datasets to facilitate training. • We evaluate our method on diverse real-world scene data, including objects, indoor environments, streetscapes, and aerial scenes, demonstrating the superior performance and generalization capability of our approach.

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation Contributions: • We propose a novel approach to inject SfM priors into diffusion-based depth estimation, enabling highly accurate and multi-view consistent depth predictions for each viewpoint. • Based on the proposed depth estimator, we design a new multi-view 3D geometry reconstruction framework and process some synthetic datasets to facilitate training. • We evaluate our method on diverse real-world scene data, including objects, indoor environments, streetscapes, and aerial scenes, demonstrating the superior performance and generalization capability of our approach.

MrNeRF

25,651 görüntüleme • 1 yıl önce

Infinite depth trick! 🌑 This simple Depth Fade node is revolutionary for UEFN maps. Use it on a plane to create realistic shadows and faux depth on doors and anything else that needs that effect! 🛠️🎨 #UEFN #Fortnite

Infinite depth trick! 🌑 This simple Depth Fade node is revolutionary for UEFN maps. Use it on a plane to create realistic shadows and faux depth on doors and anything else that needs that effect! 🛠️🎨 #UEFN #Fortnite

Sitka Creative 📱

118,443 görüntüleme • 4 ay önce

Learning Temporally Consistent Video Depth from Video Diffusion Priors This work addresses the challenge of video depth estimation, which expects not only per-frame accuracy but, more importantly, cross-frame consistency. Instead of directly

Learning Temporally Consistent Video Depth from Video Diffusion Priors This work addresses the challenge of video depth estimation, which expects not only per-frame accuracy but, more importantly, cross-frame consistency. Instead of directly

AK

81,634 görüntüleme • 2 yıl önce

This is precisely the 'aha!' moment we aim for. We believe DeFi's next evolution isn't just incremental, but a fundamental shift towards computational governance and true economic sovereignty. Thank you for recognizing the depth of what we're building. Welcome to Ark. @0x03ee

This is precisely the 'aha!' moment we aim for. We believe DeFi's next evolution isn't just incremental, but a fundamental shift towards computational governance and true economic sovereignty. Thank you for recognizing the depth of what we're building. Welcome to Ark. @0x03ee

Joy Boy | ARK

10,355 görüntüleme • 1 yıl önce

AnyDepth Depth Estimation Made Easy

AnyDepth Depth Estimation Made Easy

AK

14,765 görüntüleme • 5 ay önce

how [you can learn] to animate anything (in-depth guide) ->

how [you can learn] to animate anything (in-depth guide) ->

MOTHER IS APE

19,905 görüntüleme • 4 ay önce

DVD Deterministic Video Depth Estimation with Generative Priors paper:

DVD Deterministic Video Depth Estimation with Generative Priors paper:

AK

62,042 görüntüleme • 3 ay önce

In collaboration with Intel, our Depth Fusion showcases the power of our LDM3D diffusion model in generating 360° views from text prompts provided by the user. The LDM3D diffusion model generates a 2D RGB image and its corresponding relative depth map providing a complete RGBD representation corresponding to the text prompt. The LDM 3D model is a specialized version of the stable diffusion V 1.4 model that has been modified to fit both image and depth map data.The model was then fine tuned on a subset of the Laion400M data set - large scale image caption data set. The depth maps used to fine tune our model were generated by the DPTBeiT large 512 depth estimation model that provides highly accurate relative depth estimates for each pixel. We take the generated 2D RGB image and depth map and use them to compute a 360° projection using touchdesigner. Touchdesigner is a versatile platform that allows for the creation of immersive and interactive multimedia experiences. Our application harnesses the power of touchdesigner to bring the generated 360° views to life, providing users with a unique and engaging way to experience their text prompts, whether it’s a description of a tranquil forest, a noisy cityscape or a futuristic sci fi world. Our depth fusion can bring these concepts to life in a vivid and immersive detail. - Scottie Fox, VP Engineering Blockade Labs ScottieFox #AI #VR #3D #gamedev #stablediffusion

In collaboration with Intel, our Depth Fusion showcases the power of our LDM3D diffusion model in generating 360° views from text prompts provided by the user. The LDM3D diffusion model generates a 2D RGB image and its corresponding relative depth map providing a complete RGBD representation corresponding to the text prompt. The LDM 3D model is a specialized version of the stable diffusion V 1.4 model that has been modified to fit both image and depth map data.The model was then fine tuned on a subset of the Laion400M data set - large scale image caption data set. The depth maps used to fine tune our model were generated by the DPTBeiT large 512 depth estimation model that provides highly accurate relative depth estimates for each pixel. We take the generated 2D RGB image and depth map and use them to compute a 360° projection using touchdesigner. Touchdesigner is a versatile platform that allows for the creation of immersive and interactive multimedia experiences. Our application harnesses the power of touchdesigner to bring the generated 360° views to life, providing users with a unique and engaging way to experience their text prompts, whether it’s a description of a tranquil forest, a noisy cityscape or a futuristic sci fi world. Our depth fusion can bring these concepts to life in a vivid and immersive detail. - Scottie Fox, VP Engineering Blockade Labs ScottieFox #AI #VR #3D #gamedev #stablediffusion

Blockade Labs

11,439 görüntüleme • 3 yıl önce

Ever wondered what the real women behind ancient Greek statues looked like? I brought them back to life using: - Depth Anything (depth maps) - Omma (Code and 3D implementation) - Three.js

Ever wondered what the real women behind ancient Greek statues looked like? I brought them back to life using: - Depth Anything (depth maps) - Omma (Code and 3D implementation) - Three.js

Joseph Azar

74,281 görüntüleme • 1 ay önce