正在加载视频...
视频加载失败
4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos Abstract: We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction, trained entirely on real-world monocular posed videos. Using 4D Gaussian as an inductive bias, 4DGT unifies static and dynamic components, enabling the modeling of complex, time-varying... show more
34,782 次观看 • 1 年前 •via X (Twitter)
11 条评论

Paper: not yet Project: "4DGT takes a series of monocular frames with poses as input. During training, we subsample the temporal frames at different granularity and use all images for supervision. In stage one, we train 4DGT to predict pixel-aligned Gaussians at coarse resolution. In stage two, we prune a majority of non-activated Gaussians based on the histograms of per-patch activation channels and densify the Gaussian prediction by increasing the input token samples in both space and time. At inference time, we run the 4DGT network trained after stage two, which supports dense video frames input at high resolution."

Paper:

Wow this looks really sick

Yeah, and the clip is super long.

people don’t realize how huge this is

long long videos, yeah!

The fact that I can't get my head off this for the past few days... For me, it is that much needed tool that unlocks a lot.

Awesome. Looking forward to trying this out!

I'm crafting an email newsletter that turns my daily updates into a captivating weekly digest, complete with exclusive content. Although it's not live yet, you can sign up now! If you're curious, visit my website and join the subscriber list today!

the future is beaming into reality gaaah this is so exciting

Pretty good for monocular footage. The videos are also very long!
