正在加载视频...

视频加载失败

4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos Abstract: We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction, trained entirely on real-world monocular posed videos. Using 4D Gaussian as an inductive bias, 4DGT unifies static and dynamic components, enabling the modeling of complex, time-varying...

34,782 次观看 • 1 年前 •via X (Twitter)

11 条评论

MrNeRF 的头像
MrNeRF1 年前

Paper: not yet Project: "4DGT takes a series of monocular frames with poses as input. During training, we subsample the temporal frames at different granularity and use all images for supervision. In stage one, we train 4DGT to predict pixel-aligned Gaussians at coarse resolution. In stage two, we prune a majority of non-activated Gaussians based on the histograms of per-patch activation channels and densify the Gaussian prediction by increasing the input token samples in both space and time. At inference time, we run the 4DGT network trained after stage two, which supports dense video frames input at high resolution."

MrNeRF 的头像
MrNeRF1 年前

Paper:

Pablo Vela 的头像
Pablo Vela1 年前

Wow this looks really sick

MrNeRF 的头像
MrNeRF1 年前

Yeah, and the clip is super long.

Micky Abir 的头像
Micky Abir1 年前

people don’t realize how huge this is

MrNeRF 的头像
MrNeRF1 年前

long long videos, yeah!

TessyVFXR 的头像
TessyVFXR1 年前

The fact that I can't get my head off this for the past few days... For me, it is that much needed tool that unlocks a lot.

James | 🤖 的头像
James | 🤖1 年前

Awesome. Looking forward to trying this out!

MrNeRF 的头像
MrNeRF1 年前

I'm crafting an email newsletter that turns my daily updates into a captivating weekly digest, complete with exclusive content. Although it's not live yet, you can sign up now! If you're curious, visit my website and join the subscriber list today!

Mars (parody) 的头像
Mars (parody)1 年前

the future is beaming into reality gaaah this is so exciting

MrNeRF 的头像
MrNeRF1 年前

Pretty good for monocular footage. The videos are also very long!

相关视频

[SIGGRAPH 2025] Photoreal Scene Reconstruction from an Egocentric Device Contributions: 1. We address the importance of employing visual-inertial bundle adjustment (VIBA) that accounts for the rolling-shutter behavior of the RGB camera. This provides a continuous camera trajectory to model pixel movement in neural reconstruction. Our experiments demonstrate that using VIBA consistently improves the novel view quality in Gaussian Splatting by +1 dB in PSNR. 2. We introduce a rasterization-based image formulation pipeline that addresses common artifacts in physical image formation, including rolling shutter, lens shading, exposure, and gain compensation. Our approach is distinct in that we represent image poses as posed pixel arrays sampled from a continuous trajectory, rather than assigning a single camera pose per image, and preserve the merit of Gaussian rasterization. Unlike existing methods that require ray-tracing Gaussians, e.g., [Moenne-Loccoz et al. 2024], our formulation is applicable to general-purpose rasterization-based Gaussian splatting. When applied to 3D Gaussian Splatting (3DGS) [Kerbl et al. 2023], our approach can further enhance reconstruction quality by +1 dB. We outperform existing baselines and demonstrate a substantial quality improvement in handling complex scenes observed by egocentric devices. 3. To reduce the effect of blur from rapid head motion in darker indoor scenes, we propose a strategy of deliberately underexposing input videos during capture, inspired by HDR+ [Hasinoff et al. 2016]. We demonstrate that we can reconstruct high-quality, noise-free scene radiance from noisy, dim input videos, and further render sharp, blur-free videos at a higher dynamic range.

MrNeRF

15,244 次观看 • 1 年前