Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

📢📢📢 RoMo: Robust Motion Segmentation Improves Structure from Motion TL;DR: boost your SfM pipeline on dynamic scenes. We use epipolar cues + SAMv2 features to find robust masks for moving objects in a zero-shot manner. 🧵👇

18,603 görüntüleme • 1 yıl önce •via X (Twitter)

7 Yorum

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı
Andrea Tagliasacchi 🇨🇦1 yıl önce

Let's look at some results. An optimization process finds the moving components of the video, disentangling camera ego motion from scene motion.

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı
Andrea Tagliasacchi 🇨🇦1 yıl önce

Our masks are robust to slow/fast camera movements, and can find multiple moving objects, even when they are in the background (look at the pedestrian🧐)

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı
Andrea Tagliasacchi 🇨🇦1 yıl önce

Why care about motion masks? We show that good motion masks improve SfM performance, making COLMAP+our masks the SOTA on synthetic benchmarks. We also collect a real evaluation dataset with GT camera pose using a robotic arm, to evaluate our method in real casual captures.

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı
Andrea Tagliasacchi 🇨🇦1 yıl önce

How does it work? (three steps) 1) We find the Fundamental matrix between adjacent frames in the video with RANSAC. 2) We then identify parts of the frame that have a very low or a very high epipolar error, as weak supervision signals to find the moving objects.

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı
Andrea Tagliasacchi 🇨🇦1 yıl önce

3) Finally, we train a tiny MLP that classifies SAMv2 features as moving or static given the weak supervisory signal from high and low error masks. These features help complete the motion masks over the video effectively!

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı
Andrea Tagliasacchi 🇨🇦1 yıl önce

and just like that... we get good quality masks, without human annotation or synthetic supervision! Find more results on our website →

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı
Andrea Tagliasacchi 🇨🇦1 yıl önce

This work was led by @lily_goli and @sabour_sara. In collaboration with Mark Matthews, @marcusabrubaker, Dmitry Lagun, @fleet_dj and @srbhsxn at Google DeepMind, and @_AlecJacobson at the University of Toronto.

Benzer Videolar