Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Monocular depth estimation is “impossible” because one image can’t measure depth geometrically. Our iDisc #CVPR2023 can group pixels w/o supervision and learn depth inductive bias on groups. We get LiDAR-like (but denser) depth from single images! More:

Fisher Yu

2,887 subscribers

52,267 Aufrufe • vor 3 Jahren •via X (Twitter)

Nachrichten & Politik Wissenschaft & Technologie Bildung #CVPR2023

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von AntiFisherYu001

AntiFisherYu001vor 2 Jahren

You are a murderer

Profilbild von Adrian

Adrianvor 3 Jahren

@ValueAnalyst1 @a_meta4 check it out

Profilbild von MinotaurOnLucy

MinotaurOnLucyvor 3 Jahren

As an outsider, I have the following questions: What is the current state-of-the-art performance for out of distribution cases? If it is not good, will we see a foundational model that achieves good out of distribution performance in the near term, or would it be more mid term?

Profilbild von julien michot

julien michotvor 3 Jahren

Why "impossible"? Here you get the most plausible depths.

Profilbild von Nikolaos Sarafianos

Nikolaos Sarafianosvor 3 Jahren

This is great work! Did you test it on humans at various distances from the camera by any chance?

Profilbild von Fisher Yu

Fisher Yuvor 3 Jahren

We test the method on street scenes that have people. However, it is indeed interesting to see whether we can use the method to estimate depth in human-centric scenes.

Profilbild von ζ Pedram ζ

ζ Pedram ζvor 3 Jahren

Neat idea, github?

Profilbild von Fisher Yu

Fisher Yuvor 3 Jahren

github link: The full code will be released before CVPR 2023.

Profilbild von test bot

test botvor 3 Jahren

@Scobleizer 69

Profilbild von 𝘃𝗿𝗹𝗹𝗿𝘃

𝘃𝗿𝗹𝗹𝗿𝘃vor 3 Jahren

@Scobleizer Great work!

Ähnliche Videos

After a year of team work, we're thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3 reveals two key insights: 💎 A plain transformer (e.g., vanilla DINO) is enough. No specialized architecture. ✨ A single depth-ray representation is enough. No complex 3D tasks. Three series of models have been released: the main DA3 series, a monocular metric estimation series, and a monocular depth estimation series. The core team members, aside from me: Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y. Chen. 👇(1/n) #DepthAnything3

After a year of team work, we're thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3 reveals two key insights: 💎 A plain transformer (e.g., vanilla DINO) is enough. No specialized architecture. ✨ A single depth-ray representation is enough. No complex 3D tasks. Three series of models have been released: the main DA3 series, a monocular metric estimation series, and a monocular depth estimation series. The core team members, aside from me: Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y. Chen. 👇(1/n) #DepthAnything3

Bingyi Kang

515,128 Aufrufe • vor 8 Monaten

Want to use Depth Anything, but need metric depth rather than relative depth? Thrilled to introduce Prompt Depth Anything, a new paradigm for accurate metric depth estimation with up to 4K resolution. 👉Key Message: Depth foundation models like DA have already internalized rich geometric knowledge of the 3D world but lack a proper way to elicit it. Inspired by the success of prompting in LLMs, we propose prompting Depth Anything with metric cues to produce metric depth. This method proves to be very effective when using a low-cost lidar (e.g., iPhone's LiDAR), which is widely available, as prompts. We believe the prompt can generalize to other forms as long as scale information is provided. Prompt Depth Anything offers 1⃣A series of models for iPhone lidars. 2⃣4D reconstruction from monocular videos (captured with iPhone). 3⃣Improved generalization ability for robot manipulation, e.g. Training on cans but generalizing on glasses. 4⃣More detailed depth annotations for the ScanNet++ dataset. The first author is our excellent intern Haotong Lin. Paper: Huggingface: Project Page: Code:

Want to use Depth Anything, but need metric depth rather than relative depth? Thrilled to introduce Prompt Depth Anything, a new paradigm for accurate metric depth estimation with up to 4K resolution. 👉Key Message: Depth foundation models like DA have already internalized rich geometric knowledge of the 3D world but lack a proper way to elicit it. Inspired by the success of prompting in LLMs, we propose prompting Depth Anything with metric cues to produce metric depth. This method proves to be very effective when using a low-cost lidar (e.g., iPhone's LiDAR), which is widely available, as prompts. We believe the prompt can generalize to other forms as long as scale information is provided. Prompt Depth Anything offers 1⃣A series of models for iPhone lidars. 2⃣4D reconstruction from monocular videos (captured with iPhone). 3⃣Improved generalization ability for robot manipulation, e.g. Training on cans but generalizing on glasses. 4⃣More detailed depth annotations for the ScanNet++ dataset. The first author is our excellent intern Haotong Lin. Paper: Huggingface: Project Page: Code:

Bingyi Kang

67,604 Aufrufe • vor 1 Jahr

Depth is starting to feel less like an image… and more like geometry. InfiniDepth (open source) pushes exactly in that direction🤩 Depth is no longer just a map. It’s becoming a geometry representation. Single image → depth → 3DGS🥳 It’s not replacing real-world capture, but it’s definitely changing how image-to-3D works.

Depth is starting to feel less like an image… and more like geometry. InfiniDepth (open source) pushes exactly in that direction🤩 Depth is no longer just a map. It’s becoming a geometry representation. Single image → depth → 3DGS🥳 It’s not replacing real-world capture, but it’s definitely changing how image-to-3D works.

KIRI Engine - 3D Scanner App

26,744 Aufrufe • vor 4 Monaten

The robot’s “eyes” just received a big upgrade. LingBot-Depth 2.0, a depth-completion model with half the depth error just dropped. 12/16 benchmarks topped. Glass, mirrors, and transparent objects are so easy for us humans, but so hard for robots, because they do not behave like ordinary surfaces in a camera pipeline. A robot that misunderstands a balcony window or a table edge, will have a completely false planning inside a false world. Huge implecation. LingBot-Depth 2.0 takes an RGB image plus a broken depth map from a sensor and then outputs a cleaner depth map and a usable 3D point cloud. Numbers on LingBot-Depth 2.0 - Excels on glass, mirrors & transparent objects — where traditional depth cameras fail - Training data: 3M → 150M (50x scale-up) - 12 out of 16 first-place rankings on depth completion benchmarks - RMSE cut in half: 0.132 → 0.062 on the hardest indoor scenes LingBot-Vision trained on boundaries, because object edges carry the geometry robots need. No human boundary labels are used, which makes this approach easier to scale. The open-sourced LingBot-Vision is the general vision backbone, and LingBot-Depth 2.0 is the depth model built on it.

The robot’s “eyes” just received a big upgrade. LingBot-Depth 2.0, a depth-completion model with half the depth error just dropped. 12/16 benchmarks topped. Glass, mirrors, and transparent objects are so easy for us humans, but so hard for robots, because they do not behave like ordinary surfaces in a camera pipeline. A robot that misunderstands a balcony window or a table edge, will have a completely false planning inside a false world. Huge implecation. LingBot-Depth 2.0 takes an RGB image plus a broken depth map from a sensor and then outputs a cleaner depth map and a usable 3D point cloud. Numbers on LingBot-Depth 2.0 - Excels on glass, mirrors & transparent objects — where traditional depth cameras fail - Training data: 3M → 150M (50x scale-up) - 12 out of 16 first-place rankings on depth completion benchmarks - RMSE cut in half: 0.132 → 0.062 on the hardest indoor scenes LingBot-Vision trained on boundaries, because object edges carry the geometry robots need. No human boundary labels are used, which makes this approach easier to scale. The open-sourced LingBot-Vision is the general vision backbone, and LingBot-Depth 2.0 is the depth model built on it.

Rohan Paul

25,312 Aufrufe • vor 25 Tagen

Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results 🚀 One universal model enables SoTA for: 🔥 Mono Depth Estimation 🔥 Multi-View SfM 🔥 Multi-View Stereo 🔥 Depth Completion 🔥 Registration … and many more possibilities! – plus everything is metric 🎯 We release code for data processing, training, benchmarking & ablations – everything Apache 2.0! Details & Links 👇

Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results 🚀 One universal model enables SoTA for: 🔥 Mono Depth Estimation 🔥 Multi-View SfM 🔥 Multi-View Stereo 🔥 Depth Completion 🔥 Registration … and many more possibilities! – plus everything is metric 🎯 We release code for data processing, training, benchmarking & ablations – everything Apache 2.0! Details & Links 👇

Nikhil Keetha

122,952 Aufrufe • vor 10 Monaten

In collaboration with Intel, our Depth Fusion showcases the power of our LDM3D diffusion model in generating 360° views from text prompts provided by the user. The LDM3D diffusion model generates a 2D RGB image and its corresponding relative depth map providing a complete RGBD representation corresponding to the text prompt. The LDM 3D model is a specialized version of the stable diffusion V 1.4 model that has been modified to fit both image and depth map data.The model was then fine tuned on a subset of the Laion400M data set - large scale image caption data set. The depth maps used to fine tune our model were generated by the DPTBeiT large 512 depth estimation model that provides highly accurate relative depth estimates for each pixel. We take the generated 2D RGB image and depth map and use them to compute a 360° projection using touchdesigner. Touchdesigner is a versatile platform that allows for the creation of immersive and interactive multimedia experiences. Our application harnesses the power of touchdesigner to bring the generated 360° views to life, providing users with a unique and engaging way to experience their text prompts, whether it’s a description of a tranquil forest, a noisy cityscape or a futuristic sci fi world. Our depth fusion can bring these concepts to life in a vivid and immersive detail. - Scottie Fox, VP Engineering Blockade Labs ScottieFox #AI #VR #3D #gamedev #stablediffusion

In collaboration with Intel, our Depth Fusion showcases the power of our LDM3D diffusion model in generating 360° views from text prompts provided by the user. The LDM3D diffusion model generates a 2D RGB image and its corresponding relative depth map providing a complete RGBD representation corresponding to the text prompt. The LDM 3D model is a specialized version of the stable diffusion V 1.4 model that has been modified to fit both image and depth map data.The model was then fine tuned on a subset of the Laion400M data set - large scale image caption data set. The depth maps used to fine tune our model were generated by the DPTBeiT large 512 depth estimation model that provides highly accurate relative depth estimates for each pixel. We take the generated 2D RGB image and depth map and use them to compute a 360° projection using touchdesigner. Touchdesigner is a versatile platform that allows for the creation of immersive and interactive multimedia experiences. Our application harnesses the power of touchdesigner to bring the generated 360° views to life, providing users with a unique and engaging way to experience their text prompts, whether it’s a description of a tranquil forest, a noisy cityscape or a futuristic sci fi world. Our depth fusion can bring these concepts to life in a vivid and immersive detail. - Scottie Fox, VP Engineering Blockade Labs ScottieFox #AI #VR #3D #gamedev #stablediffusion

Blockade Labs

11,439 Aufrufe • vor 3 Jahren

Depth is our center of gravity. Complete immersion is the heart of our philosophy on investing - dedicating ourselves to our Founders’ businesses and growing with them means we can build something truly impactful. Because possibility grows the deeper you go.

Depth is our center of gravity. Complete immersion is the heart of our philosophy on investing - dedicating ourselves to our Founders’ businesses and growing with them means we can build something truly impactful. Because possibility grows the deeper you go.

Lightspeed

170,493 Aufrufe • vor 3 Jahren

Grief is not something you get over, but something you learn to live with. It's messy, unpredictable, and sometimes overwhelming, but it also teaches us the depth of love and the strength we never knew we had. Take it one day at a time.

Grief is not something you get over, but something you learn to live with. It's messy, unpredictable, and sometimes overwhelming, but it also teaches us the depth of love and the strength we never knew we had. Take it one day at a time.

drew

161,947 Aufrufe • vor 1 Jahr

🎙️ Wanderers Way | Episode One: Steven Schumacher (Part One) Head Coach Steven Schumacher is our first guest on the podcast - as we get to know our new Gaffer in more depth 🤩 Available on YouTube 👇 - as well as Spotify, Amazon Music, Apple Music and Wanderers TV. #bwfc

🎙️ Wanderers Way | Episode One: Steven Schumacher (Part One) Head Coach Steven Schumacher is our first guest on the podcast - as we get to know our new Gaffer in more depth 🤩 Available on YouTube 👇 - as well as Spotify, Amazon Music, Apple Music and Wanderers TV. #bwfc

Bolton Wanderers

44,630 Aufrufe • vor 1 Jahr

“You can’t rely on 3 or 4 players to win you games” “The drop off from our best players is drastic, I don’t get the hype around our depth” “We didn’t even rest these players and played a really good team, we played Brentford and struggled” Pys on Saturday night 👇🏻

“You can’t rely on 3 or 4 players to win you games” “The drop off from our best players is drastic, I don’t get the hype around our depth” “We didn’t even rest these players and played a really good team, we played Brentford and struggled” Pys on Saturday night 👇🏻

Chelsea Unfiltered Podcast

133,351 Aufrufe • vor 10 Monaten

📽️ The IMF World Economic Outlook is arguably the single most important report in the economic calendar. So what did we learn from the latest one? As it happens, quite a lot... About tariffs (and why they haven't led to disaster). AI. And more. Here's my in-depth primer👇

📽️ The IMF World Economic Outlook is arguably the single most important report in the economic calendar. So what did we learn from the latest one? As it happens, quite a lot... About tariffs (and why they haven't led to disaster). AI. And more. Here's my in-depth primer👇

Ed Conway

19,206 Aufrufe • vor 9 Monaten

🚀 Just created something completely new! I Rewrote Apple's Sharp to generate 3D Gaussian Splats from a single 360° equirectangular image made from FLUX 2 (Flex) One equirectangular image → Full spherical 3DGS world! ✨ New web interface - drag & drop your pano, select quality, download PLY ⚡ 2-3 sec on a older laptop Quality tiers: ▫️ Low: 500K splats ▫️ Medium: 2M splats ▫️ High: 4.7M splats ▫️ Ultra: 8.4M splats The pipeline: 360° depth estimation from equirectangular input Spherical projection maps pixels → 3D points Each point becomes a Gaussian with depth-scaled radius - Export to standard PLY for any 3DGS viewer (Unity) No multi-view capture. No COLMAP. No NeRF training. Just upload a pano, pick quality, get your splats. #3DGS #GaussianSplatting #ComputerVision #AI #madewithunity #Splats Sharp Monocular View Synthesis in Less Than a Second

🚀 Just created something completely new! I Rewrote Apple's Sharp to generate 3D Gaussian Splats from a single 360° equirectangular image made from FLUX 2 (Flex) One equirectangular image → Full spherical 3DGS world! ✨ New web interface - drag & drop your pano, select quality, download PLY ⚡ 2-3 sec on a older laptop Quality tiers: ▫️ Low: 500K splats ▫️ Medium: 2M splats ▫️ High: 4.7M splats ▫️ Ultra: 8.4M splats The pipeline: 360° depth estimation from equirectangular input Spherical projection maps pixels → 3D points Each point becomes a Gaussian with depth-scaled radius - Export to standard PLY for any 3DGS viewer (Unity) No multi-view capture. No COLMAP. No NeRF training. Just upload a pano, pick quality, get your splats. #3DGS #GaussianSplatting #ComputerVision #AI #madewithunity #Splats Sharp Monocular View Synthesis in Less Than a Second

Daniel Skaale

73,120 Aufrufe • vor 7 Monaten

Now I like RTD2 more than your average fan atm but c’mon. Here’s a clip from the “Farting alien” episodes which single-handedly has more depth, character and thematic meaning than almost all of RTD2 RTD1 might not be to your tastes but it isn’t bad, not like RTD2 is

Now I like RTD2 more than your average fan atm but c’mon. Here’s a clip from the “Farting alien” episodes which single-handedly has more depth, character and thematic meaning than almost all of RTD2 RTD1 might not be to your tastes but it isn’t bad, not like RTD2 is

Dani Kennedy - OPEN FOR COMMISSIONS!🎨

97,123 Aufrufe • vor 1 Monat

Here’s our Adelaide section from last week’s Marketplace. We flagged Rasmussen’s return, and note mutual interest in a Matt Kenyon reunion. More importantly, though, we go in-depth on why these signings are happening; how the 5-player rule is impacting the 36ers’ decisions.

Here’s our Adelaide section from last week’s Marketplace. We flagged Rasmussen’s return, and note mutual interest in a Matt Kenyon reunion. More importantly, though, we go in-depth on why these signings are happening; how the 5-player rule is impacting the 36ers’ decisions.

Olgun Uluc

17,262 Aufrufe • vor 2 Monaten

.gary washburn says #Celtics can learn from the Knicks and Spurs' depth: "You're not gonna be able to get by with Jordan Walsh, Luka Garza... Are you trusting them as one of your rotation guys in a conference finals to get it done? The answer is probably no." NEW BIG 3 NBA POD⤵️

.gary washburn says #Celtics can learn from the Knicks and Spurs' depth: "You're not gonna be able to get by with Jordan Walsh, Luka Garza... Are you trusting them as one of your rotation guys in a conference finals to get it done? The answer is probably no." NEW BIG 3 NBA POD⤵️

Celtics on CLNS

19,726 Aufrufe • vor 1 Monat

We just tested what might be the most accurate motion transfer workflow we've seen so far. It's a little more involved than a standard workflow, but the results are pretty impressive. Here's the basic workflow: 1. Start with your reference footage. Create character and environment references. 2. Convert the reference footage into a depth map (with Depth-Anything tool on Fal: 3. (Optional) Layer on additional guidance like colorized character masks (with Meta's Segment Anything: and or skeleton poses (with 4. Feed the depth map into Seedance 2.0 Omni as your motion reference along with your character and environment references. 5. Generate your final video. After testing several variations, the depth map workflow consistently produced the cleanest and most accurate motion transfer. If you're trying to recreate specific performances or complex movement, this is one of the closest AI workflows we've seen to achieving production-quality results.

We just tested what might be the most accurate motion transfer workflow we've seen so far. It's a little more involved than a standard workflow, but the results are pretty impressive. Here's the basic workflow: 1. Start with your reference footage. Create character and environment references. 2. Convert the reference footage into a depth map (with Depth-Anything tool on Fal: 3. (Optional) Layer on additional guidance like colorized character masks (with Meta's Segment Anything: and or skeleton poses (with 4. Feed the depth map into Seedance 2.0 Omni as your motion reference along with your character and environment references. 5. Generate your final video. After testing several variations, the depth map workflow consistently produced the cleanest and most accurate motion transfer. If you're trying to recreate specific performances or complex movement, this is one of the closest AI workflows we've seen to achieving production-quality results.

Curious Refuge

29,574 Aufrufe • vor 10 Tagen

my filmmaking takeaway from DISCLOSURE DAY was blast a space with light and let actors perform. so much of Spielberg's efficiency comes from deep depth of field. you can get 4 shots like this out of one when you're not worried about focus. now we light less, shoot shallow, and cut it up.

my filmmaking takeaway from DISCLOSURE DAY was blast a space with light and let actors perform. so much of Spielberg's efficiency comes from deep depth of field. you can get 4 shots like this out of one when you're not worried about focus. now we light less, shoot shallow, and cut it up.

patrick.

261,132 Aufrufe • vor 1 Monat

New paper from Apple - Sharp Monocular View Synthesis in Less than a Second Mescheder et al. @ Apple just released a very impressive paper (congrats! 🎉🥳). You give it an image and it generates a really great looking 3d Gaussian representation. Uses depth pro. It's really good. The model is about 3GB, and takes ~5-10s on my M1 Max per image. Single feedforward pass network. The video is using "Metal Splatter" to view the .ply from ml-sharp on Apple Vision Pro. Some really wow moments today trying it on different scenes.

New paper from Apple - Sharp Monocular View Synthesis in Less than a Second Mescheder et al. @ Apple just released a very impressive paper (congrats! 🎉🥳). You give it an image and it generates a really great looking 3d Gaussian representation. Uses depth pro. It's really good. The model is about 3GB, and takes ~5-10s on my M1 Max per image. Single feedforward pass network. The video is using "Metal Splatter" to view the .ply from ml-sharp on Apple Vision Pro. Some really wow moments today trying it on different scenes.

Tim Davison ᯅ

132,902 Aufrufe • vor 7 Monaten

🔥 Chamath & Friedberg Say Elon’s Optimus Robots Could Colonize Mars & ‘Unlock an Extraordinary Supply of Minerals’ on Earth “Our mining is really limited by human exposure from pressure and heat. If we can mine slightly below … our maximum depth today, it would unlock an extraordinary supply of minerals that we can't access today.”

🔥 Chamath & Friedberg Say Elon’s Optimus Robots Could Colonize Mars & ‘Unlock an Extraordinary Supply of Minerals’ on Earth “Our mining is really limited by human exposure from pressure and heat. If we can mine slightly below … our maximum depth today, it would unlock an extraordinary supply of minerals that we can't access today.”

Chief Nerd

139,105 Aufrufe • vor 9 Monaten

Today, we are open-sourcing Hunyuan World 1.1 (WorldMirror), a universal feed-forward 3D reconstruction model. 🚀🚀🚀 While our previously released Hunyuan World 1.0 (open-sourced, lite version deployable on consumer GPUs) focused on generating 3D worlds from text or single-view images, Hunyuan World 1.1 significantly expands the input scope by unlocking video-to-3D and multi-view-to-3D world creation. Highlights: 🔹Any Input, Maximized Flexibility and Fidelity: Flexibly integrates diverse geometric priors (camera poses, intrinsics, depth maps) to resolve structural ambiguities and ensure geometrically consistent 3D outputs. 🔹Any Output, SOTA Results：This elegant architecture simultaneously generates multiple 3D representations: dense point clouds, multi-view depth maps, camera parameters, surface normals, and 3D Gaussian Splattings. 🔹Single-GPU & Fast Inference: As an all-in-one, feed-forward model, Hunyuan World 1.1 runs on a single GPU and delivers all 3D attributes in a single forward pass, within seconds. 🌐Project Page: 🔗Github： 🤗Hugging Face： ✨Demo: 📄Technical Report:

Today, we are open-sourcing Hunyuan World 1.1 (WorldMirror), a universal feed-forward 3D reconstruction model. 🚀🚀🚀 While our previously released Hunyuan World 1.0 (open-sourced, lite version deployable on consumer GPUs) focused on generating 3D worlds from text or single-view images, Hunyuan World 1.1 significantly expands the input scope by unlocking video-to-3D and multi-view-to-3D world creation. Highlights: 🔹Any Input, Maximized Flexibility and Fidelity: Flexibly integrates diverse geometric priors (camera poses, intrinsics, depth maps) to resolve structural ambiguities and ensure geometrically consistent 3D outputs. 🔹Any Output, SOTA Results：This elegant architecture simultaneously generates multiple 3D representations: dense point clouds, multi-view depth maps, camera parameters, surface normals, and 3D Gaussian Splattings. 🔹Single-GPU & Fast Inference: As an all-in-one, feed-forward model, Hunyuan World 1.1 runs on a single GPU and delivers all 3D attributes in a single forward pass, within seconds. 🌐Project Page: 🔗Github： 🤗Hugging Face： ✨Demo: 📄Technical Report:

Tencent Hy

168,530 Aufrufe • vor 9 Monaten