Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos Abstract: We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction, trained entirely on real-world monocular posed videos. Using 4D Gaussian as an inductive bias, 4DGT unifies static and dynamic components, enabling the modeling of complex, time-varying... environments with varying object lifespans. We introduced a novel density control strategy in training, which allows our 4DGT to handle longer space-time input while maintaining efficient rendering at runtime. Our model processes 64 consecutive posed frames in a rolling-window fashion, predicting consistent 4D Gaussians in the scene. Unlike optimization-based methods, 4DGT performs purely feed-forward inference, reducing reconstruction time from hours to seconds and scaling effectively to long video sequences. Trained only on large-scale monocular posed video datasets, 4DGT can significantly outperform prior Gaussian-based networks in real-world videos and achieve on-par accuracy with optimization-based methods on cross-domain videos.show more

MrNeRF

15,896 subscribers

34,782 Aufrufe • vor 1 Jahr •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

11 Kommentare

Profilbild von MrNeRF

MrNeRFvor 1 Jahr

Paper: not yet Project: "4DGT takes a series of monocular frames with poses as input. During training, we subsample the temporal frames at different granularity and use all images for supervision. In stage one, we train 4DGT to predict pixel-aligned Gaussians at coarse resolution. In stage two, we prune a majority of non-activated Gaussians based on the histograms of per-patch activation channels and densify the Gaussian prediction by increasing the input token samples in both space and time. At inference time, we run the 4DGT network trained after stage two, which supports dense video frames input at high resolution."

Profilbild von MrNeRF

MrNeRFvor 1 Jahr

Paper:

Profilbild von Pablo Vela

Pablo Velavor 1 Jahr

Wow this looks really sick

Profilbild von MrNeRF

MrNeRFvor 1 Jahr

Yeah, and the clip is super long.

Profilbild von Micky Abir

Micky Abirvor 1 Jahr

people don’t realize how huge this is

Profilbild von MrNeRF

MrNeRFvor 1 Jahr

long long videos, yeah!

Profilbild von TessyVFXR

TessyVFXRvor 1 Jahr

The fact that I can't get my head off this for the past few days... For me, it is that much needed tool that unlocks a lot.

Profilbild von James | 🤖

James | 🤖vor 1 Jahr

Awesome. Looking forward to trying this out!

Profilbild von MrNeRF

MrNeRFvor 1 Jahr

I'm crafting an email newsletter that turns my daily updates into a captivating weekly digest, complete with exclusive content. Although it's not live yet, you can sign up now! If you're curious, visit my website and join the subscriber list today!

Profilbild von Mars (parody)

Mars (parody)vor 1 Jahr

the future is beaming into reality gaaah this is so exciting

Profilbild von MrNeRF

MrNeRFvor 1 Jahr

Pretty good for monocular footage. The videos are also very long!

Ähnliche Videos

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian Contributions: • We propose a novel scene representation for accurately modeling complex near-field and high-frequency reflections in real-world environments. • We developed a real-time ray-tracing renderer for 2DGS, enabling joint optimization of our representation for accurate scene reconstruction while achieving real-time rendering speeds. • Extensive experiments show that EnvGS significantly outperforms previous methods. To the best of our knowledge, EnvGS is the first method to achieve real-time photorealistic specular reflections synthesis in real-world scenes.

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian Contributions: • We propose a novel scene representation for accurately modeling complex near-field and high-frequency reflections in real-world environments. • We developed a real-time ray-tracing renderer for 2DGS, enabling joint optimization of our representation for accurate scene reconstruction while achieving real-time rendering speeds. • Extensive experiments show that EnvGS significantly outperforms previous methods. To the best of our knowledge, EnvGS is the first method to achieve real-time photorealistic specular reflections synthesis in real-world scenes.

MrNeRF

44,650 Aufrufe • vor 1 Jahr

NeoVerse Enhancing 4D World Model with in-the-wild Monocular Videos

NeoVerse Enhancing 4D World Model with in-the-wild Monocular Videos

AK

23,437 Aufrufe • vor 5 Monaten

Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any time to any view at any other time? Introducing 4D-LRM: a Large Space-Time Reconstruction Model that ... 🔹 Predicts 4D Gaussian primitives directly from multi-view tokens (no motion vectors, no HexPlane); 🔹 Uses a clean, minimal Transformer backbone; 🔹 Generalizes with fast, high-quality feedforward rendering at any view and infinite frame rate. Check out more interactive demos and scaling behaviors on our homepage/paper. 👉Website: 👉Paper:

Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any time to any view at any other time? Introducing 4D-LRM: a Large Space-Time Reconstruction Model that ... 🔹 Predicts 4D Gaussian primitives directly from multi-view tokens (no motion vectors, no HexPlane); 🔹 Uses a clean, minimal Transformer backbone; 🔹 Generalizes with fast, high-quality feedforward rendering at any view and infinite frame rate. Check out more interactive demos and scaling behaviors on our homepage/paper. 👉Website: 👉Paper:

Martin Ziqiao Ma

21,787 Aufrufe • vor 1 Jahr

1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering Contributions: • We delve into the temporal redundancy of 4D Gaussian Splatting and explain the main reason for the storage pressure and suboptimal rendering speed. • We introduce 4DGS-1K, a compact and memory-efficient framework to address these issues. It consists of two key components: a spatial-temporal variation score-based pruning strategy and a temporal filter. • Extensive experiments demonstrate that 4DGS-1K not only achieves a substantial storage reduction of approximately 41× but also accelerates rasterization to 1000+ FPS while maintaining high-quality reconstruction.

1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering Contributions: • We delve into the temporal redundancy of 4D Gaussian Splatting and explain the main reason for the storage pressure and suboptimal rendering speed. • We introduce 4DGS-1K, a compact and memory-efficient framework to address these issues. It consists of two key components: a spatial-temporal variation score-based pruning strategy and a temporal filter. • Extensive experiments demonstrate that 4DGS-1K not only achieves a substantial storage reduction of approximately 41× but also accelerates rasterization to 1000+ FPS while maintaining high-quality reconstruction.

MrNeRF

12,200 Aufrufe • vor 1 Jahr

Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video Contribution quote from the paper: In summary, our main contributions are • a comprehensive pipeline for reconstructing the shape, appearance, and behavior of real-world garments using Gaussian splatting, • an algorithm for registering garment meshes to multi- view videos with an optimization procedure based on Gaussian splatting, and • a Gaussian Garment representation that combines triangle meshes with Gaussian textures to capture photorealistic appearance and can be used as a fully controllable 3D asset.

Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video Contribution quote from the paper: In summary, our main contributions are • a comprehensive pipeline for reconstructing the shape, appearance, and behavior of real-world garments using Gaussian splatting, • an algorithm for registering garment meshes to multi- view videos with an optimization procedure based on Gaussian splatting, and • a Gaussian Garment representation that combines triangle meshes with Gaussian textures to capture photorealistic appearance and can be used as a fully controllable 3D asset.

MrNeRF

27,277 Aufrufe • vor 1 Jahr

RT-Splatting: Joint Reflection-Transmission Modeling with Gaussian Splatting Contributions: • We introduce a unified surface-volume Gaussian scene representation for jointly modeling sharp specular reflections and clear transmission in real-world scenes containing thin semi-transparent surfaces. • We propose Specular-Aware Gradient Gating to suppress misleading gradients from complex specular regions, substantially reducing floaters in the transmission branch. • Extensive experiments demonstrate that RT-Splatting significantly outperforms prior methods while maintaining real-time rendering and enabling flexible scene editing.

RT-Splatting: Joint Reflection-Transmission Modeling with Gaussian Splatting Contributions: • We introduce a unified surface-volume Gaussian scene representation for jointly modeling sharp specular reflections and clear transmission in real-world scenes containing thin semi-transparent surfaces. • We propose Specular-Aware Gradient Gating to suppress misleading gradients from complex specular regions, substantially reducing floaters in the transmission branch. • Extensive experiments demonstrate that RT-Splatting significantly outperforms prior methods while maintaining real-time rendering and enabling flexible scene editing.

MrNeRF

27,917 Aufrufe • vor 1 Monat

Feature4X Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields

Feature4X Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields

AK

13,892 Aufrufe • vor 1 Jahr

SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes abs: paper page: Existing methods for the 4D reconstruction of general, non-rigidly deforming objects focus on novel-view synthesis and neglect correspondences. However, time consistency enables advanced downstream tasks like 3D editing, motion analysis, or virtual-asset creation. We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner. Our dynamic-NeRF method takes multi-view RGB videos and background images from static cameras with known camera parameters as input. It then reconstructs the deformations of an estimated canonical model of the geometry and appearance in an online fashion. Since this canonical model is time-invariant, we obtain correspondences even for long-term, long-range motions. We employ neural scene representations to parametrize the components of our method. Like prior dynamic-NeRF methods, we use a backwards deformation model. We find non-trivial adaptations of this model necessary to handle larger motions: We decompose the deformations into a strongly regularized coarse component and a weakly regularized fine component, where the coarse component also extends the deformation field into the space surrounding the object, which enables tracking over time. We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.

SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes abs: paper page: Existing methods for the 4D reconstruction of general, non-rigidly deforming objects focus on novel-view synthesis and neglect correspondences. However, time consistency enables advanced downstream tasks like 3D editing, motion analysis, or virtual-asset creation. We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner. Our dynamic-NeRF method takes multi-view RGB videos and background images from static cameras with known camera parameters as input. It then reconstructs the deformations of an estimated canonical model of the geometry and appearance in an online fashion. Since this canonical model is time-invariant, we obtain correspondences even for long-term, long-range motions. We employ neural scene representations to parametrize the components of our method. Like prior dynamic-NeRF methods, we use a backwards deformation model. We find non-trivial adaptations of this model necessary to handle larger motions: We decompose the deformations into a strongly regularized coarse component and a weakly regularized fine component, where the coarse component also extends the deformation field into the space surrounding the object, which enables tracking over time. We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.

AK

76,380 Aufrufe • vor 2 Jahren

Robot Learning needs 4D world models! Robot Learning needs 4D world models! Robot Learning needs 4D world models! We introduce TesserAct, a 4D embodied world model that can simulate how agents interact with the 3D world over time! We achieve this by simply extending a pre-trained 2D video generation model to jointly predict RGB, depth, and surface normals. It enables: 1️⃣ Much better policy learning in the wild 2️⃣ Temporal + spatial coherence in 4D dynamic prediction 3️⃣ Novel view synthesis for embodied scenes Code: Paper Link: Project page:

Robot Learning needs 4D world models! Robot Learning needs 4D world models! Robot Learning needs 4D world models! We introduce TesserAct, a 4D embodied world model that can simulate how agents interact with the 3D world over time! We achieve this by simply extending a pre-trained 2D video generation model to jointly predict RGB, depth, and surface normals. It enables: 1️⃣ Much better policy learning in the wild 2️⃣ Temporal + spatial coherence in 4D dynamic prediction 3️⃣ Novel view synthesis for embodied scenes Code: Paper Link: Project page:

Chuang Gan

43,265 Aufrufe • vor 1 Jahr

Fast View Synthesis of Casual Videos paper page: Novel view synthesis from an in-the-wild video is difficult due to challenges like scene dynamics and lack of parallax. While existing methods have shown promising results with implicit neural radiance fields, they are slow to train and render. This paper revisits explicit video representations to synthesize high-quality novel views from a monocular video efficiently. We treat static and dynamic video content separately. Specifically, we build a global static scene model using an extended plane-based scene representation to synthesize temporally coherent novel video. Our plane-based scene representation is augmented with spherical harmonics and displacement maps to capture view-dependent effects and model non-planar complex surface geometry. We opt to represent the dynamic content as per-frame point clouds for efficiency. While such representations are inconsistency-prone, minor temporal inconsistencies are perceptually masked due to motion. We develop a method to quickly estimate such a hybrid video representation and render novel views in real time. Our experiments show that our method can render high-quality novel views from an in-the-wild video with comparable quality to state-of-the-art methods while being 100x faster in training and enabling real-time rendering.

Fast View Synthesis of Casual Videos paper page: Novel view synthesis from an in-the-wild video is difficult due to challenges like scene dynamics and lack of parallax. While existing methods have shown promising results with implicit neural radiance fields, they are slow to train and render. This paper revisits explicit video representations to synthesize high-quality novel views from a monocular video efficiently. We treat static and dynamic video content separately. Specifically, we build a global static scene model using an extended plane-based scene representation to synthesize temporally coherent novel video. Our plane-based scene representation is augmented with spherical harmonics and displacement maps to capture view-dependent effects and model non-planar complex surface geometry. We opt to represent the dynamic content as per-frame point clouds for efficiency. While such representations are inconsistency-prone, minor temporal inconsistencies are perceptually masked due to motion. We develop a method to quickly estimate such a hybrid video representation and render novel views in real time. Our experiments show that our method can render high-quality novel views from an in-the-wild video with comparable quality to state-of-the-art methods while being 100x faster in training and enabling real-time rendering.

AK

20,651 Aufrufe • vor 2 Jahren

OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering Contributions: • We propose an occlusion-aware scene division strategy that considers the scene layout and camera co-visibilities. The resulting regions barely contain occlusions, and the corresponding training cameras have a higher average contribution, leading to improved reconstruction results. • We present a region-based rendering technique that accelerates 3D Gaussian splatting in large scenes. It eliminates much of the time-consuming processing of invisible 3D Gaussians, boosting rendering speeds without noticeable quality degradation. • We conduct extensive experiments on several large-scene datasets and demonstrate that OccluGaussian achieves superior rendering quality and faster rendering speed compared to previous state-of-the-art methods.

OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering Contributions: • We propose an occlusion-aware scene division strategy that considers the scene layout and camera co-visibilities. The resulting regions barely contain occlusions, and the corresponding training cameras have a higher average contribution, leading to improved reconstruction results. • We present a region-based rendering technique that accelerates 3D Gaussian splatting in large scenes. It eliminates much of the time-consuming processing of invisible 3D Gaussians, boosting rendering speeds without noticeable quality degradation. • We conduct extensive experiments on several large-scene datasets and demonstrate that OccluGaussian achieves superior rendering quality and faster rendering speed compared to previous state-of-the-art methods.

MrNeRF

10,718 Aufrufe • vor 1 Jahr

GauS-SLAM: Dense RGB-D SLAM with Gaussian Surfels • We propose a 2D Gaussian-based incremental reconstruction strategy and a Surface-aware Depth Rendering mechanism. This approach effectively mitigates geometry distortions and improves tracking accuracy. • Our dense SLAM system features a front-end/back-end architecture and incorporates a local map design, ensuring tracking accuracy and efficiency. • We conduct extensive experiments demonstrating the superiority of our approach in both tracking accuracy and reconstruction quality compared to SOTA methods.

GauS-SLAM: Dense RGB-D SLAM with Gaussian Surfels • We propose a 2D Gaussian-based incremental reconstruction strategy and a Surface-aware Depth Rendering mechanism. This approach effectively mitigates geometry distortions and improves tracking accuracy. • Our dense SLAM system features a front-end/back-end architecture and incorporates a local map design, ensuring tracking accuracy and efficiency. • We conduct extensive experiments demonstrating the superiority of our approach in both tracking accuracy and reconstruction quality compared to SOTA methods.

MrNeRF

11,004 Aufrufe • vor 1 Jahr

This seemingly obvious prediction didn't take long to become reality. MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors Contributions: • The first real-time SLAM system using the two-view 3D reconstruction prior MASt3R [20] as a foundation. • Efficient techniques for pointmap matching, tracking and local fusion, graph construction and loop closure, and second-order global optimization. • A state-of-the-art dense SLAM system capable of handling generic, time-varying camera models. Abstract: We present a real-time monocular dense SLAM system, designed from the ground up using MASt3R, a two-view 3D reconstruction and matching prior. Equipped with this strong prior, our system remains robust on in-the-wild video sequences, making no assumptions on a fixed or parametric camera model beyond a unique camera center. Key features include: - Efficient methods for pointmap matching, camera tracking, and local fusion - Graph construction and loop closure - Second-order global optimization With known calibration, a simple modification achieves state-of-the-art performance across various benchmarks. Altogether, we propose a plug-and-play monocular SLAM system capable of producing globally-consistent poses and dense geometry while operating at 15 FPS.

This seemingly obvious prediction didn't take long to become reality. MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors Contributions: • The first real-time SLAM system using the two-view 3D reconstruction prior MASt3R [20] as a foundation. • Efficient techniques for pointmap matching, tracking and local fusion, graph construction and loop closure, and second-order global optimization. • A state-of-the-art dense SLAM system capable of handling generic, time-varying camera models. Abstract: We present a real-time monocular dense SLAM system, designed from the ground up using MASt3R, a two-view 3D reconstruction and matching prior. Equipped with this strong prior, our system remains robust on in-the-wild video sequences, making no assumptions on a fixed or parametric camera model beyond a unique camera center. Key features include: - Efficient methods for pointmap matching, camera tracking, and local fusion - Graph construction and loop closure - Second-order global optimization With known calibration, a simple modification achieves state-of-the-art performance across various benchmarks. Altogether, we propose a plug-and-play monocular SLAM system capable of producing globally-consistent poses and dense geometry while operating at 15 FPS.

MrNeRF

29,935 Aufrufe • vor 1 Jahr

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation Contributions: • We propose a novel approach to inject SfM priors into diffusion-based depth estimation, enabling highly accurate and multi-view consistent depth predictions for each viewpoint. • Based on the proposed depth estimator, we design a new multi-view 3D geometry reconstruction framework and process some synthetic datasets to facilitate training. • We evaluate our method on diverse real-world scene data, including objects, indoor environments, streetscapes, and aerial scenes, demonstrating the superior performance and generalization capability of our approach.

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation Contributions: • We propose a novel approach to inject SfM priors into diffusion-based depth estimation, enabling highly accurate and multi-view consistent depth predictions for each viewpoint. • Based on the proposed depth estimator, we design a new multi-view 3D geometry reconstruction framework and process some synthetic datasets to facilitate training. • We evaluate our method on diverse real-world scene data, including objects, indoor environments, streetscapes, and aerial scenes, demonstrating the superior performance and generalization capability of our approach.

MrNeRF

25,651 Aufrufe • vor 1 Jahr

[SIGGRAPH 2025] Photoreal Scene Reconstruction from an Egocentric Device Contributions: 1. We address the importance of employing visual-inertial bundle adjustment (VIBA) that accounts for the rolling-shutter behavior of the RGB camera. This provides a continuous camera trajectory to model pixel movement in neural reconstruction. Our experiments demonstrate that using VIBA consistently improves the novel view quality in Gaussian Splatting by +1 dB in PSNR. 2. We introduce a rasterization-based image formulation pipeline that addresses common artifacts in physical image formation, including rolling shutter, lens shading, exposure, and gain compensation. Our approach is distinct in that we represent image poses as posed pixel arrays sampled from a continuous trajectory, rather than assigning a single camera pose per image, and preserve the merit of Gaussian rasterization. Unlike existing methods that require ray-tracing Gaussians, e.g., [Moenne-Loccoz et al. 2024], our formulation is applicable to general-purpose rasterization-based Gaussian splatting. When applied to 3D Gaussian Splatting (3DGS) [Kerbl et al. 2023], our approach can further enhance reconstruction quality by +1 dB. We outperform existing baselines and demonstrate a substantial quality improvement in handling complex scenes observed by egocentric devices. 3. To reduce the effect of blur from rapid head motion in darker indoor scenes, we propose a strategy of deliberately underexposing input videos during capture, inspired by HDR+ [Hasinoff et al. 2016]. We demonstrate that we can reconstruct high-quality, noise-free scene radiance from noisy, dim input videos, and further render sharp, blur-free videos at a higher dynamic range.

[SIGGRAPH 2025] Photoreal Scene Reconstruction from an Egocentric Device Contributions: 1. We address the importance of employing visual-inertial bundle adjustment (VIBA) that accounts for the rolling-shutter behavior of the RGB camera. This provides a continuous camera trajectory to model pixel movement in neural reconstruction. Our experiments demonstrate that using VIBA consistently improves the novel view quality in Gaussian Splatting by +1 dB in PSNR. 2. We introduce a rasterization-based image formulation pipeline that addresses common artifacts in physical image formation, including rolling shutter, lens shading, exposure, and gain compensation. Our approach is distinct in that we represent image poses as posed pixel arrays sampled from a continuous trajectory, rather than assigning a single camera pose per image, and preserve the merit of Gaussian rasterization. Unlike existing methods that require ray-tracing Gaussians, e.g., [Moenne-Loccoz et al. 2024], our formulation is applicable to general-purpose rasterization-based Gaussian splatting. When applied to 3D Gaussian Splatting (3DGS) [Kerbl et al. 2023], our approach can further enhance reconstruction quality by +1 dB. We outperform existing baselines and demonstrate a substantial quality improvement in handling complex scenes observed by egocentric devices. 3. To reduce the effect of blur from rapid head motion in darker indoor scenes, we propose a strategy of deliberately underexposing input videos during capture, inspired by HDR+ [Hasinoff et al. 2016]. We demonstrate that we can reconstruct high-quality, noise-free scene radiance from noisy, dim input videos, and further render sharp, blur-free videos at a higher dynamic range.

MrNeRF

15,244 Aufrufe • vor 1 Jahr

[SIGGRAPH Asia '24 (TOG)] Representing Long Volumetric Video with Temporal Gaussian Hierarchy Contributions: • We introduce a novel, efficient, and expressive Temporal Gaussian Hierarchy representation for long volumetric video. To our knowledge, our method is the first approach capable of handling minutes of volumetric video data. • We propose a Compact Appearance Model and a new rasterization implementation to facilitate real-time, high-quality dynamic view synthesis while maintaining a compact size. • We propose a system to efficiently model long volumetric videos for the first time and demonstrate state-of-the-art dynamic view synthesis quality on the Neural3DV [Li et al. 2022], ENeRF-Outdoor [Lin et al. 2022], and MobileStage [Xu et al. 2024b] datasets, while also achieving the best rendering speed with reduced training cost and memory usage.

[SIGGRAPH Asia '24 (TOG)] Representing Long Volumetric Video with Temporal Gaussian Hierarchy Contributions: • We introduce a novel, efficient, and expressive Temporal Gaussian Hierarchy representation for long volumetric video. To our knowledge, our method is the first approach capable of handling minutes of volumetric video data. • We propose a Compact Appearance Model and a new rasterization implementation to facilitate real-time, high-quality dynamic view synthesis while maintaining a compact size. • We propose a system to efficiently model long volumetric videos for the first time and demonstrate state-of-the-art dynamic view synthesis quality on the Neural3DV [Li et al. 2022], ENeRF-Outdoor [Lin et al. 2022], and MobileStage [Xu et al. 2024b] datasets, while also achieving the best rendering speed with reduced training cost and memory usage.

MrNeRF

79,379 Aufrufe • vor 1 Jahr

[SIGGRAPH Asia '25] Editable Physically-based Reflections in Raytraced Gaussian Radiance Fields Contributions: • A reconstruction method for radiance fields, with distinct optimization for diffuse and specular components, using path tracing for the latter. • An efficient and accurate training method that reconstructs the diffuse and specular components of the scene in a single representation. • An efficient ray tracer for Gaussian primitives, fast enough to enable treatment of multiple bounces with minimal computational overhead.

[SIGGRAPH Asia '25] Editable Physically-based Reflections in Raytraced Gaussian Radiance Fields Contributions: • A reconstruction method for radiance fields, with distinct optimization for diffuse and specular components, using path tracing for the latter. • An efficient and accurate training method that reconstructs the diffuse and specular components of the scene in a single representation. • An efficient ray tracer for Gaussian primitives, fast enough to enable treatment of multiple bounces with minimal computational overhead.

MrNeRF

15,810 Aufrufe • vor 8 Monaten

[SIGGRAPH '25] Monocular Online Reconstruction with Enhanced Detail Preservation Abstract (excerpt): Our approach addresses two key challenges in monocular online reconstruction: 1. Distributing Gaussians without relying on depth maps. 2. Ensuring both local and global consistency in the reconstructed maps. To achieve this, we introduce two key modules: - Hierarchical Gaussian Management Module: For effective Gaussian distribution. - Global Consistency Optimization Module: For maintaining alignment and coherence at all scales. In addition, we present the Multi-level Occupancy Hash Voxels (MOHV), a structure that regularizes Gaussians to capture details across multiple levels of granularity. MOHV ensures accurate reconstruction of both fine and coarse geometries and textures, preserving intricate details while maintaining overall structural integrity. Compared to state-of-the-art RGB-only and even RGB-D methods, our framework achieves superior reconstruction quality with high computational efficiency.

[SIGGRAPH '25] Monocular Online Reconstruction with Enhanced Detail Preservation Abstract (excerpt): Our approach addresses two key challenges in monocular online reconstruction: 1. Distributing Gaussians without relying on depth maps. 2. Ensuring both local and global consistency in the reconstructed maps. To achieve this, we introduce two key modules: - Hierarchical Gaussian Management Module: For effective Gaussian distribution. - Global Consistency Optimization Module: For maintaining alignment and coherence at all scales. In addition, we present the Multi-level Occupancy Hash Voxels (MOHV), a structure that regularizes Gaussians to capture details across multiple levels of granularity. MOHV ensures accurate reconstruction of both fine and coarse geometries and textures, preserving intricate details while maintaining overall structural integrity. Compared to state-of-the-art RGB-only and even RGB-D methods, our framework achieves superior reconstruction quality with high computational efficiency.

MrNeRF

23,616 Aufrufe • vor 1 Jahr