Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Introducing โ€œDiffusion with Forward Modelsโ€, ๐—ฎ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜๐—ต๐—ฎ๐˜ ๐—ฐ๐—ฎ๐—ป ๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ฒ ๐—ฑ๐—ถ๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ฒ, ๐—ฟ๐—ฒ๐—ฎ๐—น ๐Ÿฏ๐—— ๐˜€๐—ฐ๐—ฒ๐—ป๐—ฒ๐˜€ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—ฎ ๐˜€๐—ถ๐—ป๐—ด๐—น๐—ฒ ๐—ถ๐—บ๐—ฎ๐—ด๐—ฒ, ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ฒ๐—ฑ ๐˜„๐—ถ๐˜๐—ต ๐—ถ๐—บ๐—ฎ๐—ด๐—ฒ๐˜€ ๐˜„/๐—ผ ๐—ฎ๐—ป๐˜† ๐Ÿฏ๐—— ๐—ฑ๐—ฎ๐˜๐—ฎ! 1/n

88,712 Aufrufe โ€ข vor 2 Jahren โ€ขvia X (Twitter)

16 Kommentare

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

Work done with @_atewari, Tianwei Yin, @GCazenavette, & @eigenstate, collaborating with Fredo Durand, Bill Freeman, Josh Tenenbaum, at my Scene Representation Group @MIT_CSAIL. Ayush and I have been working on this for more than a year - he did amazing work here!! 2/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

Conventional, non-probabilistic models such as pixelNeRF that reconstruct a 3D scene from a single image generate blurry results for any parts of the scene that were not observed in the input image. 3/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

As a diffusion model, our model instead parameterizes the ๐—ฑ๐—ถ๐˜€๐˜๐—ฟ๐—ถ๐—ฏ๐˜‚๐˜๐—ถ๐—ผ๐—ป of 3D scenes that are consistent with a single image, and can thus instead sample plausible 3D scenes in the form of radiance fields! 4/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

Recent diffusion models for novel view synthesis (GenVs, SparseFusion, etc) learn to sample from the distribution of *novel views* given context images. However, that is not what we are generally interested in. We want to directly sample from the distribution of 3D scenes! 5/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

This is difficult, b/c we never observe ground-truth 3d scenes - we only observe 2D images! We propose a new diffusion model that can nevertheless learn to directly generate 3D scenes, by integrating the differentiable renderer into each denoising step. 6/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

This enables us to solve a truly long-standing problem that Iโ€™ve attempted again and again over the years: Given just a single image, we can directly sample hundreds of 3D scenes consistent with that image - no post-processing (=Score Distillation) necessary!! 7/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

This works on *real-world* scenes in RealEstate10k and Co3D, and significantly outperforms score-distillation based approaches! This is the first time that any 3D generative model trained with images can sample from the distribution of such complex 3D scenes! 8/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

The samples are *truly* diverse. Note that each sample here is a full radiance field, from which you could - at any point - extract the pointcloud. And they vary widely in the unobserved regions! 9/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

It turns out that there is a whole class of problems, often referred to as โ€œStochastic Inverse Problemsโ€, where we are interested in modeling signals observed only through lossy forward models. 10/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

In the paper, we prototype two more applications to make this point: sampling from the distributions over plausible motions of an image, trained end-to-end from video, and probabilistic GAN inversion! 11/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

However, there is a whole wealth of problems across science and engineering that require probabilistic inversion of a known forward model! 12/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

To wrap up - we think that this is a significant step forward not only for generative modeling, but also for self-supervised training of 3D foundation models. Generating plausible 3D scenes means that our model receives plausible gradients for unobserved regions! 13/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

Weโ€™d also like to highlight concurrent work by our friends at Oxford VOG, Viewset Diffusion: which has some related ideas and looks great! 14/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

More to come, stay tuned! 15/n

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

You can watch me talk about the paper here:

Profilbild von Vincent Sitzmann
Vincent Sitzmannvor 2 Jahren

Code is out now:

ร„hnliche Videos