Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Introducing “FlowMap”, the first self-supervised, differentiable structure-from-motion method that is competitive with conventional SfM like Colmap! IMO this solves a major missing piece for internet-scale training of 3D Deep Learning methods. 1/n

128,543 Aufrufe • vor 2 Jahren •via X (Twitter)

12 Kommentare

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

Work led by our amazing @omcamsmith and @DavidCharatan with the support of the brilliant @_atewari ! 2/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

Structure-from-Motion is the only area of computer vision where non-deep learning methods - Colmap - remains the state-of-the-art. This has slowed us down: Colmap is used to generate pseudo-ground truth for 3D vision, instead of finding a self-supervised way! 3/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

FlowMap is a major step towards solving that problem: it is a fully differentiable, self-supervised structure-from-motion method! From only off-the-shelf point tracks / optical flow, FlowMap performs SfM that outperforms Colmap’s on Gaussian Splatting Novel View Synthesis! 4/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

Here are some point clouds reconstructed from FlowMap on popular scenes - it really works very robustly!! 5/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

There are two unique aspects to FlowMap: (1) Depth is the *only* free variable - poses and intrinsics are inferred feed-forward! (2) FlowMap is differentiable with respect to the depth estimator - this enables us to train one fully self-supervised just on video! 6/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

In the limit of having a *perfect* depth estimator and perfect correspondence (for instance from large-scale training), FlowMap solves the Structure-from-Motion problem - poses, intrinsics, and fused multi-view pointcloud - in a single feed-forward pass! 7/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

We have already released the code, which we’ve spent time organizing for ease of use. It includes the scripts for baselines, figures, and tables, so it will be a breeze for you to reproduce & build on top of it! 8/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

FlowMap minimizes a “camera-induced correspondence loss.” When a camera moves through a static scene, that motion induces correspondences on the image sensor according to the scene’s geometry, the camera motion, and the camera intrinsics, which we supervise with point tracks 9/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

However, solving for depth, poses and intrinsics as free variables via gradient descent does not work well (see the paper for why!). Instead, we reparameterize both poses and intrinsics in terms of depth and optical flow, *leaving only depth as a free variable*! 10/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

Even so, an unnecessary degree of freedom remains: Two identical image patches can have *different* depths! To fix this, we re-parameterize depth via a small monocular depth predictor. 11/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

We can use FlowMap itself to supervise & pre-train the depth estimator! Pre-training leads to better results & faster convergence, but not strictly necessary—it works even without any pre-training! The key is “patch-match” regularization: similar RGB patch → similar depth. 12/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

FlowMap allows us to train *any* 3D computer vision model self-supervised, just on video of static scenes. There are infinite cool follow-up directions, from feed-forward SfM to dynamics to multi-view stereo - we can't wait what the community will do with it! n/n

Ähnliche Videos