Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

📢📢📢 RoMo: Robust Motion Segmentation Improves Structure from Motion TL;DR: boost your SfM pipeline on dynamic scenes. We use epipolar cues + SAMv2 features to find robust masks for moving objects in a zero-shot manner. 🧵👇

Andrea Tagliasacchi @CVPR

16,789 subscribers

18,603 Aufrufe • vor 1 Jahr •via X (Twitter)

Bildung Gesundheit & Wellness Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

7 Kommentare

Profilbild von Andrea Tagliasacchi 🇨🇦

Andrea Tagliasacchi 🇨🇦vor 1 Jahr

Let's look at some results. An optimization process finds the moving components of the video, disentangling camera ego motion from scene motion.

Profilbild von Andrea Tagliasacchi 🇨🇦

Andrea Tagliasacchi 🇨🇦vor 1 Jahr

Our masks are robust to slow/fast camera movements, and can find multiple moving objects, even when they are in the background (look at the pedestrian🧐)

Profilbild von Andrea Tagliasacchi 🇨🇦

Andrea Tagliasacchi 🇨🇦vor 1 Jahr

Why care about motion masks? We show that good motion masks improve SfM performance, making COLMAP+our masks the SOTA on synthetic benchmarks. We also collect a real evaluation dataset with GT camera pose using a robotic arm, to evaluate our method in real casual captures.

Profilbild von Andrea Tagliasacchi 🇨🇦

Andrea Tagliasacchi 🇨🇦vor 1 Jahr

How does it work? (three steps) 1) We find the Fundamental matrix between adjacent frames in the video with RANSAC. 2) We then identify parts of the frame that have a very low or a very high epipolar error, as weak supervision signals to find the moving objects.

Profilbild von Andrea Tagliasacchi 🇨🇦

Andrea Tagliasacchi 🇨🇦vor 1 Jahr

3) Finally, we train a tiny MLP that classifies SAMv2 features as moving or static given the weak supervisory signal from high and low error masks. These features help complete the motion masks over the video effectively!

Profilbild von Andrea Tagliasacchi 🇨🇦

Andrea Tagliasacchi 🇨🇦vor 1 Jahr

and just like that... we get good quality masks, without human annotation or synthetic supervision! Find more results on our website →

Profilbild von Andrea Tagliasacchi 🇨🇦

Andrea Tagliasacchi 🇨🇦vor 1 Jahr

This work was led by @lily_goli and @sabour_sara. In collaboration with Mark Matthews, @marcusabrubaker, Dmitry Lagun, @fleet_dj and @srbhsxn at Google DeepMind, and @_AlecJacobson at the University of Toronto.

Ähnliche Videos

Text2video models are getting interesting!📽️ Check out how we leverage their space-time features in a zero-shot manner for transferring motion across objects and scenes! Led by Danah Yatim Rafail Fridman,Yoni Kasten Tali Dekel [1/3]

Text2video models are getting interesting!📽️ Check out how we leverage their space-time features in a zero-shot manner for transferring motion across objects and scenes! Led by Danah Yatim Rafail Fridman,Yoni Kasten Tali Dekel [1/3]

Omer Bar Tal

63,301 Aufrufe • vor 2 Jahren

📢📢📢📢📢📢📢📢📢📢📢 Information On How To Use Shazam Preparation for the #FLOWER song of #JISOO's #ME album Shazam with Chrome extension from PC ;

📢📢📢📢📢📢📢📢📢📢📢 Information On How To Use Shazam Preparation for the #FLOWER song of #JISOO's #ME album Shazam with Chrome extension from PC ;

JISOO STREAMING

69,556 Aufrufe • vor 3 Jahren

"DVD: Dynamic Video Depth" TL;DR: Recovers temporally consistent depth from monocular videos using diffusion priors + geometric constraints, handling dynamic scenes and motion robustly.

"DVD: Dynamic Video Depth" TL;DR: Recovers temporally consistent depth from monocular videos using diffusion priors + geometric constraints, handling dynamic scenes and motion robustly.

Alexandre Morgand

11,804 Aufrufe • vor 2 Monaten

When you use your iPhone or iPad as a passenger in a moving vehicle, Vehicle Motion Cues in #iOS18 and #iPadOS18 may help reduce motion sickness. Here’s how it works. #AppleAccessibility

When you use your iPhone or iPad as a passenger in a moving vehicle, Vehicle Motion Cues in #iOS18 and #iPadOS18 may help reduce motion sickness. Here’s how it works. #AppleAccessibility

Apple Support

79,067 Aufrufe • vor 1 Jahr

📢Animating the Uncaptured 📢 We animate 3D humanoid meshes using video diffusion priors given a text prompt. 🎥 🌍 Realistic motion generation for 3D characters - without motion capture! 🚀 Great work by Marc Benedí Angela Dai

📢Animating the Uncaptured 📢 We animate 3D humanoid meshes using video diffusion priors given a text prompt. 🎥 🌍 Realistic motion generation for 3D characters - without motion capture! 🚀 Great work by Marc Benedí Angela Dai

Matthias Niessner

11,697 Aufrufe • vor 1 Jahr

Remarkably lifelike motion and fluidity. BeyondMimic is a framework for training humanoid whole-body control from large mocap datasets. First, an open-source motion-tracking pipeline to reproduce diverse, highly dynamic human skills on real hardware, then distilling them into a guided state-action diffusion model for zero-shot, task-specific control. Project page:

Remarkably lifelike motion and fluidity. BeyondMimic is a framework for training humanoid whole-body control from large mocap datasets. First, an open-source motion-tracking pipeline to reproduce diverse, highly dynamic human skills on real hardware, then distilling them into a guided state-action diffusion model for zero-shot, task-specific control. Project page:

The Humanoid Hub

58,081 Aufrufe • vor 10 Monaten

📢📢📢 𝐀𝐜𝐜𝐞𝐥𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐍𝐞𝐮𝐫𝐚𝐥 𝐅𝐢𝐞𝐥𝐝 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐯𝐢𝐚 𝐒𝐨𝐟𝐭 𝐌𝐢𝐧𝐢𝐧𝐠 by Shakiba Kheradmand et al. TL;DR: importance sampling for accelerating your novel view synthesis workloads (...yes, it should also work for 3DGS)

📢📢📢 𝐀𝐜𝐜𝐞𝐥𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐍𝐞𝐮𝐫𝐚𝐥 𝐅𝐢𝐞𝐥𝐝 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐯𝐢𝐚 𝐒𝐨𝐟𝐭 𝐌𝐢𝐧𝐢𝐧𝐠 by Shakiba Kheradmand et al. TL;DR: importance sampling for accelerating your novel view synthesis workloads (...yes, it should also work for 3DGS)

Andrea Tagliasacchi 🇨🇦

21,810 Aufrufe • vor 2 Jahren

📢Code Release of Pixel3DMM 📢 Looking for a robust and accurate face tracker? Our state-of-the-art tracker handles challenging in-the-wild settings, such as extreme lighting conditions, fast movements, and occlusions. 👨‍💻 🌍

📢Code Release of Pixel3DMM 📢 Looking for a robust and accurate face tracker? Our state-of-the-art tracker handles challenging in-the-wild settings, such as extreme lighting conditions, fast movements, and occlusions. 👨‍💻 🌍

Matthias Niessner

29,976 Aufrufe • vor 1 Jahr

Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Contributions: • We propose STORM, the first feed-forward, self-supervised method for fast and accurate reconstruction of dynamic 3D scenes from sparse, multi-timestep, posed camera images. • Our bottom-up framework aggregates and transforms per-frame 3D Gaussian Splats into a cohesive scene representation, enabling self-supervised motion estimation. Furthermore, we introduce motion tokens that capture common motion primitives and regularize motion predictions, facilitating dynamic motion group segmentation without explicit motion or correspondence supervision. • We present several enhancements for in-the-wild scenarios, including sky modeling, camera exposure inconsistency handling, large novel-view extrapolation, and fine-grained human motions reconstruction, making STORM well-suited for real-world applications.

Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Contributions: • We propose STORM, the first feed-forward, self-supervised method for fast and accurate reconstruction of dynamic 3D scenes from sparse, multi-timestep, posed camera images. • Our bottom-up framework aggregates and transforms per-frame 3D Gaussian Splats into a cohesive scene representation, enabling self-supervised motion estimation. Furthermore, we introduce motion tokens that capture common motion primitives and regularize motion predictions, facilitating dynamic motion group segmentation without explicit motion or correspondence supervision. • We present several enhancements for in-the-wild scenarios, including sky modeling, camera exposure inconsistency handling, large novel-view extrapolation, and fine-grained human motions reconstruction, making STORM well-suited for real-world applications.

MrNeRF

53,292 Aufrufe • vor 1 Jahr

📢📢📢 𝐀𝐂𝟑𝐃: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers TL;DR: for 3D camera control in generative video, it really helps knowing *which* part of your model you should mess with Internship by Sherwin Bahmani at Snap

📢📢📢 𝐀𝐂𝟑𝐃: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers TL;DR: for 3D camera control in generative video, it really helps knowing which part of your model you should mess with Internship by Sherwin Bahmani at Snap

Andrea Tagliasacchi 🇨🇦→✈️→🇺🇲 (@CVPR)

23,040 Aufrufe • vor 1 Jahr

📢📢📢 Introducing "𝐁𝐚𝐲𝐞𝐬' 𝐑𝐚𝐲𝐬: Uncertainty Quantification for Neural Radiance Fields" TL;DR: 𝐜𝐚𝐧 𝐈 "𝐭𝐫𝐮𝐬𝐭" 𝐦𝐲 𝐍𝐞𝐑𝐅 𝐚𝐭 𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧 𝐱? with Lily Goli , @sellan_s, Cody Reading, Alec Jacobson.

📢📢📢 Introducing "𝐁𝐚𝐲𝐞𝐬' 𝐑𝐚𝐲𝐬: Uncertainty Quantification for Neural Radiance Fields" TL;DR: 𝐜𝐚𝐧 𝐈 "𝐭𝐫𝐮𝐬𝐭" 𝐦𝐲 𝐍𝐞𝐑𝐅 𝐚𝐭 𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧 𝐱? with Lily Goli , @sellan_s, Cody Reading, Alec Jacobson.

Andrea Tagliasacchi 🇨🇦

77,827 Aufrufe • vor 2 Jahren

NOAH📢 ON📢 HIS📢 HANDS📢 AND📢 KNEES📢 FOR📢 DETHRONE📢

NOAH📢 ON📢 HIS📢 HANDS📢 AND📢 KNEES📢 FOR📢 DETHRONE📢

𝐬𝐩𝐞𝐜𝐭𝐞𝐫 𝐲𝐚𝐧𝐚 Ꙩ

26,642 Aufrufe • vor 3 Monaten

📢📢📢📢USE NOW BEFORE THEY PATCH IF YOU ARE GRINDING NEBULA CAMO📢📢📢📢

📢📢📢📢USE NOW BEFORE THEY PATCH IF YOU ARE GRINDING NEBULA CAMO📢📢📢📢

Doug

269,947 Aufrufe • vor 1 Jahr

【📢】 VTube Studio 1.35.0 is now released!! This update adds a fully dynamic, reactive sound effect system!! 🔊🎶🥁 🧵👇

【📢】 VTube Studio 1.35.0 is now released!! This update adds a fully dynamic, reactive sound effect system!! 🔊🎶🥁 🧵👇

VTube Studio

849,918 Aufrufe • vor 2 Monaten

📢📢 𝐏𝐞𝐫𝐜𝐇𝐞𝐚𝐝: 𝐏𝐞𝐫𝐜𝐞𝐩𝐭𝐮𝐚𝐥 𝐇𝐞𝐚𝐝 𝐌𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐒𝐢𝐧𝐠𝐥𝐞-𝐈𝐦𝐚𝐠𝐞 𝟑𝐃 𝐇𝐞𝐚𝐝 𝐑𝐞𝐜𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 & 𝐄𝐝𝐢𝐭𝐢𝐧𝐠📢📢 PercHead reconstructs realistic 3D heads from a single image and enables disentangled 3D editing via geometric controls and style inputs from images or text. At its core is a generalized 3D head decoder trained with perceptual supervision from DINOv2 and SAM 2.1. We find that our new perceptual loss formulation improves reconstruction fidelity compared to commonly-used methods such as LPIPS. Our trained reconstruction model is able to generate 3D-consistent heads from a single input image. Even with challenging side-view inputs, the model robustly infers missing regions for a coherent, high-fidelity output. In addition, our architecture seamlessly adapts to downstream tasks: by swapping the encoder, we can transform the model into a disentangled 3D editing pipeline. In this scenario, we can control geometry through - potentially hand-drawn - segmentation maps, and condition style via image or text prompt. We also provide an interactive GUI to enable the exploration of our editing pipeline. 🌍 📽️ Great work by Antonio Oroz and Tobias Kirschstein

📢📢 𝐏𝐞𝐫𝐜𝐇𝐞𝐚𝐝: 𝐏𝐞𝐫𝐜𝐞𝐩𝐭𝐮𝐚𝐥 𝐇𝐞𝐚𝐝 𝐌𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐒𝐢𝐧𝐠𝐥𝐞-𝐈𝐦𝐚𝐠𝐞 𝟑𝐃 𝐇𝐞𝐚𝐝 𝐑𝐞𝐜𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 & 𝐄𝐝𝐢𝐭𝐢𝐧𝐠📢📢 PercHead reconstructs realistic 3D heads from a single image and enables disentangled 3D editing via geometric controls and style inputs from images or text. At its core is a generalized 3D head decoder trained with perceptual supervision from DINOv2 and SAM 2.1. We find that our new perceptual loss formulation improves reconstruction fidelity compared to commonly-used methods such as LPIPS. Our trained reconstruction model is able to generate 3D-consistent heads from a single input image. Even with challenging side-view inputs, the model robustly infers missing regions for a coherent, high-fidelity output. In addition, our architecture seamlessly adapts to downstream tasks: by swapping the encoder, we can transform the model into a disentangled 3D editing pipeline. In this scenario, we can control geometry through - potentially hand-drawn - segmentation maps, and condition style via image or text prompt. We also provide an interactive GUI to enable the exploration of our editing pipeline. 🌍 📽️ Great work by Antonio Oroz and Tobias Kirschstein

Matthias Niessner

18,808 Aufrufe • vor 7 Monaten

Found the perfect sport to stress test Meta's SAM3 person segmentation. Dense crowds, extreme motion, zero structure. This is as tough as it gets, and SAM3 nailed it.

Found the perfect sport to stress test Meta's SAM3 person segmentation. Dense crowds, extreme motion, zero structure. This is as tough as it gets, and SAM3 nailed it.

Dilum Sanjaya

157,580 Aufrufe • vor 6 Monaten

📢📢 "Proteina: Scaling Flow-based Protein Structure Generative Models" #ICLR2025 (Oral Presentation) 🔥 Project page: 📜 Paper: 🛠️ Code and weights: 🧵Details in thread... (1/n)

📢📢 "Proteina: Scaling Flow-based Protein Structure Generative Models" #ICLR2025 (Oral Presentation) 🔥 Project page: 📜 Paper: 🛠️ Code and weights: 🧵Details in thread... (1/n)

Karsten Kreis

42,365 Aufrufe • vor 1 Jahr

📢 Nitin Gadkari vows to make Andhra roads like America in 2 years. ~ Assures robust, all-round road development across the state. CM Chandrababu Naidu was also present on the stage.

📢 Nitin Gadkari vows to make Andhra roads like America in 2 years. ~ Assures robust, all-round road development across the state. CM Chandrababu Naidu was also present on the stage.

The Analyzer (News Updates🗞️)

40,144 Aufrufe • vor 10 Monaten

Humanoids finally move like humans… and can do more than copy. [Details + demos in thread 👇] A new framework, BeyondMimic, shows how to learn naturalistic whole-body control from human motion. But then goes further by composing those skills into versatile, zero-shot policies. Two key pieces make it work: •Motion tracking pipeline: robustly turns human mocap into dynamic robot motions (jump spins, sprinting, cartwheels) on real hardware •Guided diffusion policy: distills motion primitives and enables zero-shot downstream tasks with simple cost functions The result: ✅ State-of-the-art motion quality from mocap tracking ✅ Versatile task execution; waypoint navigation, joystick teleop, obstacle avoidance ✅ Stable sim-to-real transfer for diverse humanoid skills Thank you for sharing, Qiayuan Liao! 📍 Paper: Video:

Humanoids finally move like humans… and can do more than copy. [Details + demos in thread 👇] A new framework, BeyondMimic, shows how to learn naturalistic whole-body control from human motion. But then goes further by composing those skills into versatile, zero-shot policies. Two key pieces make it work: •Motion tracking pipeline: robustly turns human mocap into dynamic robot motions (jump spins, sprinting, cartwheels) on real hardware •Guided diffusion policy: distills motion primitives and enables zero-shot downstream tasks with simple cost functions The result: ✅ State-of-the-art motion quality from mocap tracking ✅ Versatile task execution; waypoint navigation, joystick teleop, obstacle avoidance ✅ Stable sim-to-real transfer for diverse humanoid skills Thank you for sharing, Qiayuan Liao! 📍 Paper: Video:

Ilir Aliu - eu/acc

72,522 Aufrufe • vor 10 Monaten