MrNeRF's banner

MrNeRF

@janusch_patas • 16,807 subscribers

Founder and CEO of https://t.co/5MjtfpwEU3 | Your guide to radiance fields | Host of the podcast @ViewDependent | FTP: 279 | discord: https://t.co/lrl64WGvlD

Shorts

Two weeks ago I fixed one of my teeth with algorithms I wrote a couple of years ago! I got hooked by 3D scanning when I started to work for a software shop in Zurich that was programming 3D computational geometry algorithms for denture scanning to produce crowns (and more). Back then, a typical reconstruction pipeline was like: scan the patient’s teeth using an intraoral scanner, reconstruct the surface mesh, design the restoration digitally, and finally mill the crown out of ceramic. We were working mostly with point clouds and meshes, but it wasn’t just math, it was craftsmanship translated into a digital process. Every micron mattered. You could literally see how a good algorithm meant a better fit in someone’s mouth. Gaussian Splatting isn’t about surface reconstruction, it’s about appearance reconstruction. It doesn’t care about explicit topology, it captures how light interacts with the scene. In a sense, it’s the opposite philosophy of the dental world: instead of modeling what the object is, it models how the object looks. 3D Gaussian Splatting enables applications like training self driving cars, teaching robots to understand their environment, creating virtual worlds, or monitoring real sites. It represents scenes as millions of small Gaussians rendered in real time without the need for meshes or textures. Coming from a world where precision geometry was everything, this shift felt natural. It’s still about reconstruction, but with a different goal: not manufacturing a perfect object, but reproducing how the world actually looks. Two weeks ago I got my first dental crown, made with the same software, reconstruction algorithms, and Swiss precision I once helped develop. I haven’t worked there in two years, but sitting in that chair and seeing the process from the other side was a proud moment. It reminded me why I love this field.

Two weeks ago I fixed one of my teeth with algorithms I wrote a couple of years ago! I got hooked by 3D scanning when I started to work for a software shop in Zurich that was programming 3D computational geometry algorithms for denture scanning to produce crowns (and more). Back then, a typical reconstruction pipeline was like: scan the patient’s teeth using an intraoral scanner, reconstruct the surface mesh, design the restoration digitally, and finally mill the crown out of ceramic. We were working mostly with point clouds and meshes, but it wasn’t just math, it was craftsmanship translated into a digital process. Every micron mattered. You could literally see how a good algorithm meant a better fit in someone’s mouth. Gaussian Splatting isn’t about surface reconstruction, it’s about appearance reconstruction. It doesn’t care about explicit topology, it captures how light interacts with the scene. In a sense, it’s the opposite philosophy of the dental world: instead of modeling what the object is, it models how the object looks. 3D Gaussian Splatting enables applications like training self driving cars, teaching robots to understand their environment, creating virtual worlds, or monitoring real sites. It represents scenes as millions of small Gaussians rendered in real time without the need for meshes or textures. Coming from a world where precision geometry was everything, this shift felt natural. It’s still about reconstruction, but with a different goal: not manufacturing a perfect object, but reproducing how the world actually looks. Two weeks ago I got my first dental crown, made with the same software, reconstruction algorithms, and Swiss precision I once helped develop. I haven’t worked there in two years, but sitting in that chair and seeing the process from the other side was a proud moment. It reminded me why I love this field.

290,140 просмотров

Release: LichtFeld Studio v0.5.3 is out! With 316 commits merged into master, this release is a huge step forward for LichtFeld Studio. What's new in v0.5.3 • Vulkan viewer/rendering migration: New Vulkan viewport pipeline, pass graph, VkSplat renderer, Vulkan point-cloud renderer, 3DGUT/VkSplat support, improved alpha/depth composition, tighter CUDA/Vulkan interoperability, and device matching on multi-GPU systems. • RAD + LOD workflow: Added RAD file export/import, RAD LOD viewer, Spark-style GPU LOD selection, GPU-driven page prefetching, a bounded VRAM pool, out-of-core PLY-to-RAD LOD conversion, and RAD import/export speedups of approximately 3–5×. • HiGS / macro-tile inference: Added a macro-tile inference path for the Vulkan viewer, including macro sorting, batched rasterization, composition, and capacity management. • Asset Manager: Added and significantly enhanced the Asset Manager with thumbnails, SH information, faster synchronization, import-from-URL support, docked mode, data-loading popup integration, and general UI cleanup. • Viewport export: Integrated viewport export directly into the application as a toolbar/overlay tool, added fast render_view_u8-style readback paths, fixed high-resolution clipping issues, improved orthographic export parity, resolved 32K image/video export problems, and added post-export GPU resource cleanup. • Selection and tooling: Added and reworked selection toolbar controls, the Select menu, ring selection, color eyedropper, distance-from-center selection, faster point-cloud and zoomed-out selection paths, Vulkan measurement tool fixes, and drag-and-drop scene graph improvements. • UI/RmlUi platform work: Major RmlUi redesign efforts, hot reloading for RML/RCSS/Python UI files, reactive UI/store integration, viewport toolbar flyouts, improved histogram interactions, input settings enhancements, custom TRS gizmos, and numerous panel, tooltip, and localization fixes. • Windowing and UX: Added borderless window support, title bar drag/maximize/restore behavior, work-area-aware maximize functionality, resize responsiveness and performance improvements, and DPI/UI scaling fixes. • Training and data features: Added adaptive depth loss and depth gradients for the EWA rasterizer, mask loading/application fixes, a new combined Ignore+Segment mask mode, --add-splat, --freeze, improved checkpoint and training state handling, and training speed and VRAM optimizations. • COLMAP/equirectangular support: Added SPHERICAL/equirectangular camera model support and canonical EQUIRECTANGULAR handling, along with fixes for undistortion and camera export. This release will be available to all supporters as a Windows binary via approximately in about an hour. At the same time, LichtFeld Studio remains committed to being free and open source under GPLv3 and can also be built directly from source. Please consider supporting the ongoing development of LichtFeld Studio through a donation via the portal or the supporters page. Thank you to everyone who supports this project financially, contributes code, reports bugs, provides datasets, helps with the website, and contributes in countless other ways. A special thank you to our foundational sponsor Core11 and our Gold Sponsor Volinga, whose support has helped make the current state of the software possible. Thank you as well to every donor and to all of our new Bronze Sponsors. Looking ahead to v0.6 For the next major release, work will focus primarily on stability and user experience. This includes improved cleanup workflows and the ability to modify training parameters while training is in progress. I would also like to introduce a native .licht project format that allows users to save and restore their complete editor state. You can find links to our main sponsors below. Please also visit our website to discover all our Bronze Sponsors. Hint: We do not yet have a Silver Sponsor or Platinum 😉

Release: LichtFeld Studio v0.5.3 is out! With 316 commits merged into master, this release is a huge step forward for LichtFeld Studio. What's new in v0.5.3 • Vulkan viewer/rendering migration: New Vulkan viewport pipeline, pass graph, VkSplat renderer, Vulkan point-cloud renderer, 3DGUT/VkSplat support, improved alpha/depth composition, tighter CUDA/Vulkan interoperability, and device matching on multi-GPU systems. • RAD + LOD workflow: Added RAD file export/import, RAD LOD viewer, Spark-style GPU LOD selection, GPU-driven page prefetching, a bounded VRAM pool, out-of-core PLY-to-RAD LOD conversion, and RAD import/export speedups of approximately 3–5×. • HiGS / macro-tile inference: Added a macro-tile inference path for the Vulkan viewer, including macro sorting, batched rasterization, composition, and capacity management. • Asset Manager: Added and significantly enhanced the Asset Manager with thumbnails, SH information, faster synchronization, import-from-URL support, docked mode, data-loading popup integration, and general UI cleanup. • Viewport export: Integrated viewport export directly into the application as a toolbar/overlay tool, added fast render_view_u8-style readback paths, fixed high-resolution clipping issues, improved orthographic export parity, resolved 32K image/video export problems, and added post-export GPU resource cleanup. • Selection and tooling: Added and reworked selection toolbar controls, the Select menu, ring selection, color eyedropper, distance-from-center selection, faster point-cloud and zoomed-out selection paths, Vulkan measurement tool fixes, and drag-and-drop scene graph improvements. • UI/RmlUi platform work: Major RmlUi redesign efforts, hot reloading for RML/RCSS/Python UI files, reactive UI/store integration, viewport toolbar flyouts, improved histogram interactions, input settings enhancements, custom TRS gizmos, and numerous panel, tooltip, and localization fixes. • Windowing and UX: Added borderless window support, title bar drag/maximize/restore behavior, work-area-aware maximize functionality, resize responsiveness and performance improvements, and DPI/UI scaling fixes. • Training and data features: Added adaptive depth loss and depth gradients for the EWA rasterizer, mask loading/application fixes, a new combined Ignore+Segment mask mode, --add-splat, --freeze, improved checkpoint and training state handling, and training speed and VRAM optimizations. • COLMAP/equirectangular support: Added SPHERICAL/equirectangular camera model support and canonical EQUIRECTANGULAR handling, along with fixes for undistortion and camera export. This release will be available to all supporters as a Windows binary via approximately in about an hour. At the same time, LichtFeld Studio remains committed to being free and open source under GPLv3 and can also be built directly from source. Please consider supporting the ongoing development of LichtFeld Studio through a donation via the portal or the supporters page. Thank you to everyone who supports this project financially, contributes code, reports bugs, provides datasets, helps with the website, and contributes in countless other ways. A special thank you to our foundational sponsor Core11 and our Gold Sponsor Volinga, whose support has helped make the current state of the software possible. Thank you as well to every donor and to all of our new Bronze Sponsors. Looking ahead to v0.6 For the next major release, work will focus primarily on stability and user experience. This includes improved cleanup workflows and the ability to modify training parameters while training is in progress. I would also like to introduce a native .licht project format that allows users to save and restore their complete editor state. You can find links to our main sponsors below. Please also visit our website to discover all our Bronze Sponsors. Hint: We do not yet have a Silver Sponsor or Platinum 😉

25,496 просмотров

EDGS: Eliminating Densification for Efficient Convergence of 3DGS Contributions: • We show that initial triangulation based on 2D correspondences can replace the incremental refinement process, fundamentally changing how 3DGS models allocate resources. • Our method reduces the path each Gaussian must travel in parameter space. Careful initialization not only accelerates convergence but also guides optimization toward a convergence point corresponding to lower reconstruction error and thus higher reconstruction quality. • Our approach outperforms both speed-optimized and quality-focused state-of-the-art models while using only half the splats of standard 3DGS. By improving initialization rather than altering the optimization process, this method is compatible with other 3DGS acceleration techniques, making it a flexible enhancement to existing models.

EDGS: Eliminating Densification for Efficient Convergence of 3DGS Contributions: • We show that initial triangulation based on 2D correspondences can replace the incremental refinement process, fundamentally changing how 3DGS models allocate resources. • Our method reduces the path each Gaussian must travel in parameter space. Careful initialization not only accelerates convergence but also guides optimization toward a convergence point corresponding to lower reconstruction error and thus higher reconstruction quality. • Our approach outperforms both speed-optimized and quality-focused state-of-the-art models while using only half the splats of standard 3DGS. By improving initialization rather than altering the optimization process, this method is compatible with other 3DGS acceleration techniques, making it a flexible enhancement to existing models.

124,101 просмотров

Human Hair Reconstruction with Strand-Aligned 3D Gaussians Contributions (cited): – We propose a new 3D line lifting scheme that uses a modified 3DGS reconstruction technique to lift 2D orientation maps into a 3D field while also providing refinement of the camera parameters; – We introduce a dual representation of hair strand polylines and 3D Gaussians to achieve differentiable rasterization of hair strands and leverage photometric constraints for strand-based hair reconstruction; – Based on these components, we propose a coarse-to-fine optimization method for prior-guided hair reconstruction that leverages both latent and explicit representations of the hairstyle.

Human Hair Reconstruction with Strand-Aligned 3D Gaussians Contributions (cited): – We propose a new 3D line lifting scheme that uses a modified 3DGS reconstruction technique to lift 2D orientation maps into a 3D field while also providing refinement of the camera parameters; – We introduce a dual representation of hair strand polylines and 3D Gaussians to achieve differentiable rasterization of hair strands and leverage photometric constraints for strand-based hair reconstruction; – Based on these components, we propose a coarse-to-fine optimization method for prior-guided hair reconstruction that leverages both latent and explicit representations of the hairstyle.

106,525 просмотров

First fully ML-framework-free 3D Gaussian Splatting implementation in LichtFeld Studio. I’ve completed the migration of the full training pipeline to a custom CUDA-based tensor library. No PyTorch, no LibTorch, no autograd. Every gradient is implemented by hand, either through CUDA kernels or minimal abstractions on top. This makes it the first full training setup for 3D Gaussian Splatting with zero dependencies on existing ML frameworks. It’s not just about independence, it's about control! We now manage every byte of GPU memory, which opens the door to tighter optimization and finer performance tuning. The framework footprint is minimal, without pulling in gigabytes of ML runtime code that was never designed for real-time or graphics-driven applications. A few modules, such as the metrics and 3DGUT interfaces, are still being ported, and some operations are temporarily naïve, so performance is not yet on par with master. But this refactor lays the groundwork for: - A fully self-contained binary - Fine-grained memory optimization - Easier experimentation without the weight of an ML stack We’re getting close.

First fully ML-framework-free 3D Gaussian Splatting implementation in LichtFeld Studio. I’ve completed the migration of the full training pipeline to a custom CUDA-based tensor library. No PyTorch, no LibTorch, no autograd. Every gradient is implemented by hand, either through CUDA kernels or minimal abstractions on top. This makes it the first full training setup for 3D Gaussian Splatting with zero dependencies on existing ML frameworks. It’s not just about independence, it's about control! We now manage every byte of GPU memory, which opens the door to tighter optimization and finer performance tuning. The framework footprint is minimal, without pulling in gigabytes of ML runtime code that was never designed for real-time or graphics-driven applications. A few modules, such as the metrics and 3DGUT interfaces, are still being ported, and some operations are temporarily naïve, so performance is not yet on par with master. But this refactor lays the groundwork for: - A fully self-contained binary - Fine-grained memory optimization - Easier experimentation without the weight of an ML stack We’re getting close.

50,539 просмотров

Is Google taking initial steps to enhance Street View? For some reason, Street View seems stuck in technology that feels outdated. I wonder if we'll see such improvements on the product side. Also, note how much better it performs in all aspects compared to Zip-NeRF in their presented material. It offers more details and fewer artifacts. Great work! "LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering" Contributions: • We propose a novel LOD representation for 3DGS which, unlike previous methods [27, 28, 17], does not recompute the list of used Gaussians at each frame. This allows for acceleration and compaction, enabling the rendering of large-scale scenes even on mobile devices. • We design a strategy to automatically select optimal hyperparameters for splitting LODs, whereas most other methods require manual tuning of hyperparameters for each 3D scene. • To further accelerate rendering, we split the scene into chunks and pre-compute sets of active Gaussians per chunk. • Finally, we introduce a novel opacity interpolation scheme to produce visually pleasing rendering and eliminate artifacts when transitioning between chunks.

Is Google taking initial steps to enhance Street View? For some reason, Street View seems stuck in technology that feels outdated. I wonder if we'll see such improvements on the product side. Also, note how much better it performs in all aspects compared to Zip-NeRF in their presented material. It offers more details and fewer artifacts. Great work! "LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering" Contributions: • We propose a novel LOD representation for 3DGS which, unlike previous methods [27, 28, 17], does not recompute the list of used Gaussians at each frame. This allows for acceleration and compaction, enabling the rendering of large-scale scenes even on mobile devices. • We design a strategy to automatically select optimal hyperparameters for splitting LODs, whereas most other methods require manual tuning of hyperparameters for each 3D scene. • To further accelerate rendering, we split the scene into chunks and pre-compute sets of active Gaussians per chunk. • Finally, we introduce a novel opacity interpolation scheme to produce visually pleasing rendering and eliminate artifacts when transitioning between chunks.

62,564 просмотров

My C++ 3DGS implementation has transitioned to the gsplat backend and is now licensed under Apache 2.0. - Supports MCMC densification by default. - Includes a fused bilateral grid implementation. - A basic viewer, contributed by the community, is available with more features in development. - Runs in headless mode. Exciting plans are underway for the project's evolution over the sumner. Check it out!

My C++ 3DGS implementation has transitioned to the gsplat backend and is now licensed under Apache 2.0. - Supports MCMC densification by default. - Includes a fused bilateral grid implementation. - A basic viewer, contributed by the community, is available with more features in development. - Runs in headless mode. Exciting plans are underway for the project's evolution over the sumner. Check it out!

59,904 просмотров

You should also check the project page for the interactive demos, it is truely impressive!

You should also check the project page for the interactive demos, it is truely impressive!

60,476 просмотров

[SIGGRAPH ASIA '25] Detail-Enhanced Gaussian Splatting for Large-Scale Volumetric Capture Contributions: - A two-stage approach to performance capture, combining a scene-scale capture rig and a single-actor facial capture rig. - A novel high-quality scene-scale volumetric performance capture rig, incorporating both static and dynamic cameras to track the performance of multiple actors. - A reconstruction pipeline for dynamic performance capture, featuring stable calibration of moving cameras and 4DGS with improved dynamic range and color fidelity. - A detail enhancement Diffusion Model, which supports 4K, RGB, and Alpha, with improved temporal stability.

[SIGGRAPH ASIA '25] Detail-Enhanced Gaussian Splatting for Large-Scale Volumetric Capture Contributions: - A two-stage approach to performance capture, combining a scene-scale capture rig and a single-actor facial capture rig. - A novel high-quality scene-scale volumetric performance capture rig, incorporating both static and dynamic cameras to track the performance of multiple actors. - A reconstruction pipeline for dynamic performance capture, featuring stable calibration of moving cameras and 4DGS with improved dynamic range and color fidelity. - A detail enhancement Diffusion Model, which supports 4K, RGB, and Alpha, with improved temporal stability.

42,456 просмотров

Triangle Splatting for Real-Time Radiance Field Rendering Contributions: (i) We propose Triangle Splatting, a novel approach that directly optimizes unstructured triangles, bridging traditional computer graphics and radiance fields. (ii) We introduce a differentiable window function for soft triangle boundaries, enabling effective gradient flow. (iii) We demonstrate qualitatively and quantitatively that Triangle Splatting outperforms concurrent methods in terms of visual quality and rendering speed, and achieves superior perceptual quality compared to the state-of-the-art Zip-NeRF on indoor scenes. (iv) The optimized triangles are directly compatible with standard mesh-based renderers, enabling seamless integration into traditional graphics pipelines.

Triangle Splatting for Real-Time Radiance Field Rendering Contributions: (i) We propose Triangle Splatting, a novel approach that directly optimizes unstructured triangles, bridging traditional computer graphics and radiance fields. (ii) We introduce a differentiable window function for soft triangle boundaries, enabling effective gradient flow. (iii) We demonstrate qualitatively and quantitatively that Triangle Splatting outperforms concurrent methods in terms of visual quality and rendering speed, and achieves superior perceptual quality compared to the state-of-the-art Zip-NeRF on indoor scenes. (iv) The optimized triangles are directly compatible with standard mesh-based renderers, enabling seamless integration into traditional graphics pipelines.

51,407 просмотров

This is completely nuts. Can't wait until the paper is released! "SuperGaussian: Repurposing Video Models for 3D Super Resolution" Project: Paper video ⬇️ 1 I 2

This is completely nuts. Can't wait until the paper is released! "SuperGaussian: Repurposing Video Models for 3D Super Resolution" Project: Paper video ⬇️ 1 I 2

78,221 просмотров

Human3R: Everyone Everywhere All at Once Note: I recorded the video from the interactive demo on their project page (linked in the comment below). Abstract (excerpt): Human3R jointly recovers global multi-person SMPL-X bodies ("everyone"), dense 3D scenes ("everywhere"), and camera trajectories in a single forward pass ("all-at-once"). Our method builds upon the 4D online reconstruction model CUT3R and uses parameter-efficient visual prompt tuning to preserve CUT3R's rich spatiotemporal priors while enabling direct readout of multiple SMPL-X bodies. Human3R is a unified model that eliminates heavy dependencies and iterative refinement. After being trained on the relatively small-scale synthetic dataset BEDLAM for just one day on one GPU, it achieves superior performance with remarkable efficiency: it reconstructs multiple humans in a one-shot manner, along with 3D scenes, in one stage, at real-time speed (15 FPS) with a low memory footprint (8 GB).

Human3R: Everyone Everywhere All at Once Note: I recorded the video from the interactive demo on their project page (linked in the comment below). Abstract (excerpt): Human3R jointly recovers global multi-person SMPL-X bodies ("everyone"), dense 3D scenes ("everywhere"), and camera trajectories in a single forward pass ("all-at-once"). Our method builds upon the 4D online reconstruction model CUT3R and uses parameter-efficient visual prompt tuning to preserve CUT3R's rich spatiotemporal priors while enabling direct readout of multiple SMPL-X bodies. Human3R is a unified model that eliminates heavy dependencies and iterative refinement. After being trained on the relatively small-scale synthetic dataset BEDLAM for just one day on one GPU, it achieves superior performance with remarkable efficiency: it reconstructs multiple humans in a one-shot manner, along with 3D scenes, in one stage, at real-time speed (15 FPS) with a low memory footprint (8 GB).

35,783 просмотров

Oh wow! I just tested Splatt3R with my own data on my computer, which creates 3D Gaussian Splats at 4 FPS on uncalibrated 512x512 2D images! It's by far the fastest 3D reconstruction method, powered by MASt3R. Check out the video!

Oh wow! I just tested Splatt3R with my own data on my computer, which creates 3D Gaussian Splats at 4 FPS on uncalibrated 512x512 2D images! It's by far the fastest 3D reconstruction method, powered by MASt3R. Check out the video!

67,464 просмотров

Seamless non-repetitive texture painting. This is awesome: "[SIGGRAPH '24] Diffusion Texture Painting" Paper (pdf): Project:

Seamless non-repetitive texture painting. This is awesome: "[SIGGRAPH '24] Diffusion Texture Painting" Paper (pdf): Project:

74,478 просмотров

Check out their project page linked in the post below. They have some really cool demos to try out! They’ve also released the demo code

Check out their project page linked in the post below. They have some really cool demos to try out! They’ve also released the demo code

30,951 просмотров

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

52,801 просмотров

Official Launch of the MrNeRF 3DGS Bounty 2: We're offering 🏆 $1600 + $500 bonus for improving initialization & training without densification for 3D Gaussian Splatting! RT & tag friends who might crush this. Details in thread 👇

Official Launch of the MrNeRF 3DGS Bounty 2: We're offering 🏆 $1600 + $500 bonus for improving initialization & training without densification for 3D Gaussian Splatting! RT & tag friends who might crush this. Details in thread 👇

35,133 просмотров

That's really cool. 4D Gaussian Splatting running in the browser by Gmix. Went somehow under the radar with the #Sora release.

That's really cool. 4D Gaussian Splatting running in the browser by Gmix. Went somehow under the radar with the #Sora release.

70,240 просмотров

PackUV: Packed Gaussian UV Maps for 4D Volumetric Video - PackUV — A new volumetric video representation that packs 3D Gaussian attributes into a sequence of UV atlases for efficient streaming and storage, making it readily compatible with existing video coding infrastructure. - PackUV-GS — An efficient method to fit PackUV directly from multiview videos using optical-flow-based keyframing and Gaussian labeling to handle large motions, disocclusions, and temporal consistency. - PackUV-2B — The largest multi-view 4D dataset with 2B frames, large motions, and disocclusions. It provides 360° coverage from 50+ synchronized cameras.

PackUV: Packed Gaussian UV Maps for 4D Volumetric Video - PackUV — A new volumetric video representation that packs 3D Gaussian attributes into a sequence of UV atlases for efficient streaming and storage, making it readily compatible with existing video coding infrastructure. - PackUV-GS — An efficient method to fit PackUV directly from multiview videos using optical-flow-based keyframing and Gaussian labeling to handle large motions, disocclusions, and temporal consistency. - PackUV-2B — The largest multi-view 4D dataset with 2B frames, large motions, and disocclusions. It provides 360° coverage from 50+ synchronized cameras.

17,035 просмотров

"Spann3R: 3D Reconstruction with Spatial Memory" In a nutshell: DUSt3R strikes again! Paper: Project: Method ⬇️

"Spann3R: 3D Reconstruction with Spatial Memory" In a nutshell: DUSt3R strikes again! Paper: Project: Method ⬇️

52,560 просмотров

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

1 Billion Gaussians (1,035,804,128 exactly) streaming to the viewer at 60fps (vsync) and 5GB VRAM. There is no limit anymore to how much it can do. Some need a billion dollar corp to make similar things happen, others need paid pseudo influencers without skills and knowledge to make your software look bad because someone is shitting his pants. LichtFeld Studio is a product of love and passion. So help support this project by donating to it. 1B training on a single RTX 4090 will fall next. Ply -> rad took 28min which is not optimized yet. There is in general some more room for optimization. The dataset is just a 2x2 tile provided by: 3D scanning data created and provided by Andrii Shramko, TELEPORTOUR. Links in comment to the dataset provider!

1 Billion Gaussians (1,035,804,128 exactly) streaming to the viewer at 60fps (vsync) and 5GB VRAM. There is no limit anymore to how much it can do. Some need a billion dollar corp to make similar things happen, others need paid pseudo influencers without skills and knowledge to make your software look bad because someone is shitting his pants. LichtFeld Studio is a product of love and passion. So help support this project by donating to it. 1B training on a single RTX 4090 will fall next. Ply -> rad took 28min which is not optimized yet. There is in general some more room for optimization. The dataset is just a 2x2 tile provided by: 3D scanning data created and provided by Andrii Shramko, TELEPORTOUR. Links in comment to the dataset provider!

32,098 просмотров • 1 месяц назад

I'm excited to share the Geo Register Plugin for LichtFeld Studio from the LichtFeld community! This plugin helps bring Gaussian splat scenes into real-world geographic space. It registers a scene to WGS-84 and ECEF coordinates, so you can click any point on the model and get its latitude, longitude and altitude. It supports multiple georeferencing sources, including EXIF GPS data, image position CSVs, RealityScan camera parameters and saved similarity transforms. Once the scene is registered, you can export geo-referenced splat models as LAS, LAZ or 3D Tiles datasets for use in GIS and 3D mapping workflows. Built for anyone working with drone data, photogrammetry, Gaussian splatting, GIS, ArcGIS or CesiumJS. Link in the comment below!

I'm excited to share the Geo Register Plugin for LichtFeld Studio from the LichtFeld community! This plugin helps bring Gaussian splat scenes into real-world geographic space. It registers a scene to WGS-84 and ECEF coordinates, so you can click any point on the model and get its latitude, longitude and altitude. It supports multiple georeferencing sources, including EXIF GPS data, image position CSVs, RealityScan camera parameters and saved similarity transforms. Once the scene is registered, you can export geo-referenced splat models as LAS, LAZ or 3D Tiles datasets for use in GIS and 3D mapping workflows. Built for anyone working with drone data, photogrammetry, Gaussian splatting, GIS, ArcGIS or CesiumJS. Link in the comment below!

51,610 просмотров • 2 месяцев назад

Europe Builds. Others Profit. 3D Gaussian Splatting (3DGS) is the perfect case study. It reflects both Europe’s brilliance and its chronic inability to turn that brilliance into business. Almost everything that made 3DGS possible was born in Europe. From the early breakthroughs in point-based rasterization in Switzerland to the cumulative research from Austria, Greece, and Germany executed in France, Europe built the foundation. No other continent can match that level of scientific collaboration and intellectual strength. The LichtFeld Studio bounty later confirmed it: the biggest performance leaps came straight out of European labs. The science was here. The innovation was here. The talent was here. But the business was not. When 3DGS exploded, my inbox filled with messages from US-based companies, not from Europe. In the United States, Luma AI and Polycam turned the paper into products within weeks. They did not wait for funding programs or EU consortia. They simply built. Then came China, which not only caught up in research but quickly outpaced everyone in commercialization. XGRID, DJI, and many others built thriving businesses around what Europe invented. Today, most 3DGS papers come from Chinese institutions rather than European ones. Meanwhile, the usual giants such as Meta, NVIDIA, Google, Netflix, and Tesla continue to iterate, integrate, and push forward. A thriving ecosystem of startups like World Labs leverages this technology to create new products and markets. The innovation cycle in the United States and China is fast, relentless, and market-driven. Europe, in contrast, remains bureaucratic and slow. We fund excellence and celebrate publications, but we rarely ship, even though some small startups are trying to change the status quo. Our researchers create the breakthroughs; others create the successful products. Until Europe finds a way to bridge the gap between laboratories and markets, it will remain the world’s research and development department: brilliant, underpaid, and underleveraged. Research is Europe’s comfort zone. Execution must become its strength. Video: One of my dynamic 3D Gaussian implementations based on the paper "Representing Long Volumetric Video with Temporal Gaussian Hierarchy."

Europe Builds. Others Profit. 3D Gaussian Splatting (3DGS) is the perfect case study. It reflects both Europe’s brilliance and its chronic inability to turn that brilliance into business. Almost everything that made 3DGS possible was born in Europe. From the early breakthroughs in point-based rasterization in Switzerland to the cumulative research from Austria, Greece, and Germany executed in France, Europe built the foundation. No other continent can match that level of scientific collaboration and intellectual strength. The LichtFeld Studio bounty later confirmed it: the biggest performance leaps came straight out of European labs. The science was here. The innovation was here. The talent was here. But the business was not. When 3DGS exploded, my inbox filled with messages from US-based companies, not from Europe. In the United States, Luma AI and Polycam turned the paper into products within weeks. They did not wait for funding programs or EU consortia. They simply built. Then came China, which not only caught up in research but quickly outpaced everyone in commercialization. XGRID, DJI, and many others built thriving businesses around what Europe invented. Today, most 3DGS papers come from Chinese institutions rather than European ones. Meanwhile, the usual giants such as Meta, NVIDIA, Google, Netflix, and Tesla continue to iterate, integrate, and push forward. A thriving ecosystem of startups like World Labs leverages this technology to create new products and markets. The innovation cycle in the United States and China is fast, relentless, and market-driven. Europe, in contrast, remains bureaucratic and slow. We fund excellence and celebrate publications, but we rarely ship, even though some small startups are trying to change the status quo. Our researchers create the breakthroughs; others create the successful products. Until Europe finds a way to bridge the gap between laboratories and markets, it will remain the world’s research and development department: brilliant, underpaid, and underleveraged. Research is Europe’s comfort zone. Execution must become its strength. Video: One of my dynamic 3D Gaussian implementations based on the paper "Representing Long Volumetric Video with Temporal Gaussian Hierarchy."

159,359 просмотров • 8 месяцев назад

3.11 billion Gaussians. I think this is a world record, no? The updates are already sluggish and I can't push further with the hardware limitations I have (mainly disk space). The viewer still sits at around 5GB VRAM. This is a proof of concept that it scales and works. With more optimization there should be no limit. However, I won't drive this forward as it is pretty much pointless as of today.

3.11 billion Gaussians. I think this is a world record, no? The updates are already sluggish and I can't push further with the hardware limitations I have (mainly disk space). The viewer still sits at around 5GB VRAM. This is a proof of concept that it scales and works. With more optimization there should be no limit. However, I won't drive this forward as it is pretty much pointless as of today.

28,123 просмотров • 1 месяц назад

3D Gaussian Splatting Driving Simulator! You know those YouTube walking tours? I think this could become a similar trend like driving around in a splat or taking your driver's license test in a 3DGS simulator.

3D Gaussian Splatting Driving Simulator! You know those YouTube walking tours? I think this could become a similar trend like driving around in a splat or taking your driver's license test in a 3DGS simulator.

139,246 просмотров • 9 месяцев назад

Everybody is stepping into the LOD game, and so is LichtFeld Studio. Preview of 260M Gaussians streaming into the viewer live. I use the RAD format which is processed on GPU within LichtFeld. It can be also simply dumped straight into the spark.js web viewer, albeit it will die at that amount of Gaussians. What other solutions don't tell you is that they need hours to preprocess a 3DGS ply to make it streamable. This was just a ply exported to RAD by LichtFeld Studio's convert tool (took 5 min at that size) and it is immediately ready to stream. In the comments there is a smaller dataset with 103M Gaussians that streams on startup into the viewer. Both datasets were created by Andrii Shramko. Let's see how far I can push this (need bigger datasets)

Everybody is stepping into the LOD game, and so is LichtFeld Studio. Preview of 260M Gaussians streaming into the viewer live. I use the RAD format which is processed on GPU within LichtFeld. It can be also simply dumped straight into the spark.js web viewer, albeit it will die at that amount of Gaussians. What other solutions don't tell you is that they need hours to preprocess a 3DGS ply to make it streamable. This was just a ply exported to RAD by LichtFeld Studio's convert tool (took 5 min at that size) and it is immediately ready to stream. In the comments there is a smaller dataset with 103M Gaussians that streams on startup into the viewer. Both datasets were created by Andrii Shramko. Let's see how far I can push this (need bigger datasets)

23,523 просмотров • 1 месяц назад

RT-Splatting: Joint Reflection-Transmission Modeling with Gaussian Splatting Contributions: • We introduce a unified surface-volume Gaussian scene representation for jointly modeling sharp specular reflections and clear transmission in real-world scenes containing thin semi-transparent surfaces. • We propose Specular-Aware Gradient Gating to suppress misleading gradients from complex specular regions, substantially reducing floaters in the transmission branch. • Extensive experiments demonstrate that RT-Splatting significantly outperforms prior methods while maintaining real-time rendering and enabling flexible scene editing.

RT-Splatting: Joint Reflection-Transmission Modeling with Gaussian Splatting Contributions: • We introduce a unified surface-volume Gaussian scene representation for jointly modeling sharp specular reflections and clear transmission in real-world scenes containing thin semi-transparent surfaces. • We propose Specular-Aware Gradient Gating to suppress misleading gradients from complex specular regions, substantially reducing floaters in the transmission branch. • Extensive experiments demonstrate that RT-Splatting significantly outperforms prior methods while maintaining real-time rendering and enabling flexible scene editing.

28,228 просмотров • 2 месяцев назад

Driving Gaussians has become nuts!

Driving Gaussians has become nuts!

30,374 просмотров • 3 месяцев назад

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery TL;DR: Skyfall-GS converts satellite images to explorable 3D urban scenes using diffusion models, with real-time rendering performance. Contributions: • We introduce Skyfall-GS, the first method to synthesize immersive, real-time, free-flight navigable 3D urban scenes solely from multi-view satellite imagery using generative refinement. • An open-domain refinement approach leverages pre-trained text-to-image diffusion models without domain-specific training. • A curriculum-learning-based iterative refinement strategy progressively enhances reconstruction quality from higher to lower viewpoints, significantly improving visual fidelity in occluded areas.

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery TL;DR: Skyfall-GS converts satellite images to explorable 3D urban scenes using diffusion models, with real-time rendering performance. Contributions: • We introduce Skyfall-GS, the first method to synthesize immersive, real-time, free-flight navigable 3D urban scenes solely from multi-view satellite imagery using generative refinement. • An open-domain refinement approach leverages pre-trained text-to-image diffusion models without domain-specific training. • A curriculum-learning-based iterative refinement strategy progressively enhances reconstruction quality from higher to lower viewpoints, significantly improving visual fidelity in occluded areas.

66,111 просмотров • 9 месяцев назад

Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications Short summary: In our system, we animate a background 3D mesh and have the Gaussian splats follow the mesh’s vertices. During preprocessing, splats are assigned to mesh vertices, and their relative transformations are stored. Once this data is saved, you can instantly use it in your applications without further preprocessing. At runtime, we animate the background 3D mesh, update the Gaussian splats in parallel, and resort all Gaussian splats every frame based on the viewer’s perspective.

Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications Short summary: In our system, we animate a background 3D mesh and have the Gaussian splats follow the mesh’s vertices. During preprocessing, splats are assigned to mesh vertices, and their relative transformations are stored. Once this data is saved, you can instantly use it in your applications without further preprocessing. At runtime, we animate the background 3D mesh, update the Gaussian splats in parallel, and resort all Gaussian splats every frame based on the viewer’s perspective.

66,016 просмотров • 9 месяцев назад

Here is some new footage from this paper, offering a glimpse into the future of dynamic 3D Gaussian Splatting models combined with static reconstructed scenes. Imagine this: when the lighting matches, the result becomes practically indistinguishable from reality. Just pick a scene, add characters, and record it from any angle. Apply diffusion models to instantly change the look. I firmly believe this is the future of VFX.

Here is some new footage from this paper, offering a glimpse into the future of dynamic 3D Gaussian Splatting models combined with static reconstructed scenes. Imagine this: when the lighting matches, the result becomes practically indistinguishable from reality. Just pick a scene, add characters, and record it from any angle. Apply diffusion models to instantly change the look. I firmly believe this is the future of VFX.

57,843 просмотров • 8 месяцев назад

[SIGGRAPH '26] Anchored Temporal Gaussian Splatting for Long Volumetric Video Representation TL;DR: We present ATGS, a novel framework for volumetric video reconstruction that effectively handles long sequences and complex motions. By utilizing time-conditioned anchors and a temporal windowing strategy, ATGS enhances temporal coherence and scalability. Abstract (excerpt): Key insight is that explicitly tracking long term complex motion with individual Gaussian primitives is inherently unstable. Instead, we organize Gaussians around time conditioned anchors that localize their spatial and temporal support, thereby reducing long range motion complexity. We further introduce a temporal windowing strategy to activate only anchors relevant to the queried time, which improves scalability and temporal coherence. In addition, to ensure spatial and temporal stability, we design a compact set of multi level anchor features that encode global features, local spatial features, and local temporal features, jointly constraining Gaussian generation. Extensive experiments demonstrate that ATGS consistently outperforms prior methods on long sequence volumetric videos with complex motions.

[SIGGRAPH '26] Anchored Temporal Gaussian Splatting for Long Volumetric Video Representation TL;DR: We present ATGS, a novel framework for volumetric video reconstruction that effectively handles long sequences and complex motions. By utilizing time-conditioned anchors and a temporal windowing strategy, ATGS enhances temporal coherence and scalability. Abstract (excerpt): Key insight is that explicitly tracking long term complex motion with individual Gaussian primitives is inherently unstable. Instead, we organize Gaussians around time conditioned anchors that localize their spatial and temporal support, thereby reducing long range motion complexity. We further introduce a temporal windowing strategy to activate only anchors relevant to the queried time, which improves scalability and temporal coherence. In addition, to ensure spatial and temporal stability, we design a compact set of multi level anchor features that encode global features, local spatial features, and local temporal features, jointly constraining Gaussian generation. Extensive experiments demonstrate that ATGS consistently outperforms prior methods on long sequence volumetric videos with complex motions.

26,905 просмотров • 3 месяцев назад

MAGS-SLAM: Monocular Multi-Agent Gaussian Splatting SLAM for Geometrically and Photometrically Consistent Reconstruction TL;DR: The first RGB-only multi-agent 3D Gaussian Splatting SLAM for collaborative photorealistic scene reconstruction. Contributions: (1) We propose the first monocular RGB-only multi-agent 3D Gaussian Splatting SLAM system. It integrates Gaussian front-ends, compact submap summaries, inter-agent verification, Sim(3) submap pose graph, and occupancy-aware fusion into a unified framework, achieving accurate tracking and photorealistic reconstruction without depth sensors. (2) We propose a Pose-Graph Bundle Adjustment (PGBA)-consistent Sim(3) loop closure mechanism for multi-agent systems, which jointly resolves intra- and inter-agent scale drift through a submap-level Sim(3) pose graph coupling geometric and photometric residuals. Robustness is ensured by a spatial-extent gate that rejects degenerate loops and an adaptive edge invalidation scheme consistent with evolving PGBA corrections. (3) We propose an occupancy-aware fusion framework for coherent multi-agent Gaussian maps. It combines occupancy-grid deduplication, decoupled coordinator, and joint pose-Gaussian photometric refinement to eliminate duplicated Gaussians, residual misalignment, and photometric seams across agents. (4) We introduce ReplicaMultiagent Plus dataset. While existing multi-agent datasets are typically limited to 2-3 agents with short trajectories, our dataset scales to 4 agents with long-horizon trajectories. In addition, we provide ground-truth geometry and semantic annotations, supporting the evaluation of monocular, RGB-D, and semantic multi-agent SLAM for collaborative dense reconstruction.

MAGS-SLAM: Monocular Multi-Agent Gaussian Splatting SLAM for Geometrically and Photometrically Consistent Reconstruction TL;DR: The first RGB-only multi-agent 3D Gaussian Splatting SLAM for collaborative photorealistic scene reconstruction. Contributions: (1) We propose the first monocular RGB-only multi-agent 3D Gaussian Splatting SLAM system. It integrates Gaussian front-ends, compact submap summaries, inter-agent verification, Sim(3) submap pose graph, and occupancy-aware fusion into a unified framework, achieving accurate tracking and photorealistic reconstruction without depth sensors. (2) We propose a Pose-Graph Bundle Adjustment (PGBA)-consistent Sim(3) loop closure mechanism for multi-agent systems, which jointly resolves intra- and inter-agent scale drift through a submap-level Sim(3) pose graph coupling geometric and photometric residuals. Robustness is ensured by a spatial-extent gate that rejects degenerate loops and an adaptive edge invalidation scheme consistent with evolving PGBA corrections. (3) We propose an occupancy-aware fusion framework for coherent multi-agent Gaussian maps. It combines occupancy-grid deduplication, decoupled coordinator, and joint pose-Gaussian photometric refinement to eliminate duplicated Gaussians, residual misalignment, and photometric seams across agents. (4) We introduce ReplicaMultiagent Plus dataset. While existing multi-agent datasets are typically limited to 2-3 agents with short trajectories, our dataset scales to 4 agents with long-horizon trajectories. In addition, we provide ground-truth geometry and semantic annotations, supporting the evaluation of monocular, RGB-D, and semantic multi-agent SLAM for collaborative dense reconstruction.

19,357 просмотров • 2 месяцев назад

[SIGGRAPH Asia '24 (TOG)] Representing Long Volumetric Video with Temporal Gaussian Hierarchy Contributions: • We introduce a novel, efficient, and expressive Temporal Gaussian Hierarchy representation for long volumetric video. To our knowledge, our method is the first approach capable of handling minutes of volumetric video data. • We propose a Compact Appearance Model and a new rasterization implementation to facilitate real-time, high-quality dynamic view synthesis while maintaining a compact size. • We propose a system to efficiently model long volumetric videos for the first time and demonstrate state-of-the-art dynamic view synthesis quality on the Neural3DV [Li et al. 2022], ENeRF-Outdoor [Lin et al. 2022], and MobileStage [Xu et al. 2024b] datasets, while also achieving the best rendering speed with reduced training cost and memory usage.

[SIGGRAPH Asia '24 (TOG)] Representing Long Volumetric Video with Temporal Gaussian Hierarchy Contributions: • We introduce a novel, efficient, and expressive Temporal Gaussian Hierarchy representation for long volumetric video. To our knowledge, our method is the first approach capable of handling minutes of volumetric video data. • We propose a Compact Appearance Model and a new rasterization implementation to facilitate real-time, high-quality dynamic view synthesis while maintaining a compact size. • We propose a system to efficiently model long volumetric videos for the first time and demonstrate state-of-the-art dynamic view synthesis quality on the Neural3DV [Li et al. 2022], ENeRF-Outdoor [Lin et al. 2022], and MobileStage [Xu et al. 2024b] datasets, while also achieving the best rendering speed with reduced training cost and memory usage.

79,379 просмотров • 1 год назад

[SIGGRAPH ASIA '25] Detail-Enhanced Gaussian Splatting for Large-Scale Volumetric Capture Contributions: - A two-stage approach to performance capture, combining a scene-scale capture rig and a single-actor facial capture rig. - A novel high-quality scene-scale volumetric performance capture rig, incorporating both static and dynamic cameras to track the performance of multiple actors. - A reconstruction pipeline for dynamic performance capture, featuring stable calibration of moving cameras and 4DGS with improved dynamic range and color fidelity. - A detail enhancement Diffusion Model, which supports 4K, RGB, and Alpha, with improved temporal stability.

[SIGGRAPH ASIA '25] Detail-Enhanced Gaussian Splatting for Large-Scale Volumetric Capture Contributions: - A two-stage approach to performance capture, combining a scene-scale capture rig and a single-actor facial capture rig. - A novel high-quality scene-scale volumetric performance capture rig, incorporating both static and dynamic cameras to track the performance of multiple actors. - A reconstruction pipeline for dynamic performance capture, featuring stable calibration of moving cameras and 4DGS with improved dynamic range and color fidelity. - A detail enhancement Diffusion Model, which supports 4K, RGB, and Alpha, with improved temporal stability.

42,456 просмотров • 8 месяцев назад

Forget about #Sora. DUSt3R is the real deal. I took two pictures of our kitchen that barely overlap. It took << 2sec on a RTX 4090 to reconstruct it in an insane quality. Can we get out a point cloud for Gaussian Splatting #3DGS training + the camera poses?

Forget about #Sora. DUSt3R is the real deal. I took two pictures of our kitchen that barely overlap. It took << 2sec on a RTX 4090 to reconstruct it in an insane quality. Can we get out a point cloud for Gaussian Splatting #3DGS training + the camera poses?

103,288 просмотров • 2 лет назад

ViPE: Video Pose Engine for 3D Geometric Perception Contributions: • A robust and efficient framework, ViPE, for estimating camera parameters and dense depth from diverse, in-the-wild videos. • A system design that integrates the strengths of classical SLAM (efficiency, scalability) and learned models (robustness), with key improvements in efficiency, dynamic object handling, and depth quality over prior work. • A large-scale dataset of annotated videos, created using ViPE, to facilitate future research in 3D computer vision.

ViPE: Video Pose Engine for 3D Geometric Perception Contributions: • A robust and efficient framework, ViPE, for estimating camera parameters and dense depth from diverse, in-the-wild videos. • A system design that integrates the strengths of classical SLAM (efficiency, scalability) and learned models (robustness), with key improvements in efficiency, dynamic object handling, and depth quality over prior work. • A large-scale dataset of annotated videos, created using ViPE, to facilitate future research in 3D computer vision.

42,553 просмотров • 11 месяцев назад

Huge update: LichtFeld Studio v0.5.0 🚀 What’s new: • Embedded Python runtime + plugin system makes LFS fully hackable and extensible (isolated uv environments, hot reload) • Integrated plugin marketplace (6 plugins incl. Sharp4D, densification++) • MCP protocol integration (full parity with the user interaction layer) • Mesh rendering + OpenMesh (Python) + Mesh2Splat • ImprovedGS+ (arxiv:2603.08661) • RmlUI-based GUI (HTML and CSS style workflows) • Undo/Redo with plugin integration, Sequencer, PPiSP Huge thanks to our corporate sponsor Core11 GmbH and to all contributors 🙏Enjoy! If you find it useful, consider supporting the project by donating to keep it evolving. Next up: better training quality and smoother editing workflows

Huge update: LichtFeld Studio v0.5.0 🚀 What’s new: • Embedded Python runtime + plugin system makes LFS fully hackable and extensible (isolated uv environments, hot reload) • Integrated plugin marketplace (6 plugins incl. Sharp4D, densification++) • MCP protocol integration (full parity with the user interaction layer) • Mesh rendering + OpenMesh (Python) + Mesh2Splat • ImprovedGS+ (arxiv:2603.08661) • RmlUI-based GUI (HTML and CSS style workflows) • Undo/Redo with plugin integration, Sequencer, PPiSP Huge thanks to our corporate sponsor Core11 GmbH and to all contributors 🙏Enjoy! If you find it useful, consider supporting the project by donating to keep it evolving. Next up: better training quality and smoother editing workflows

19,563 просмотров • 4 месяцев назад

Code dropped:

Code dropped:

54,880 просмотров • 1 год назад

Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Contributions: • We propose STORM, the first feed-forward, self-supervised method for fast and accurate reconstruction of dynamic 3D scenes from sparse, multi-timestep, posed camera images. • Our bottom-up framework aggregates and transforms per-frame 3D Gaussian Splats into a cohesive scene representation, enabling self-supervised motion estimation. Furthermore, we introduce motion tokens that capture common motion primitives and regularize motion predictions, facilitating dynamic motion group segmentation without explicit motion or correspondence supervision. • We present several enhancements for in-the-wild scenarios, including sky modeling, camera exposure inconsistency handling, large novel-view extrapolation, and fine-grained human motions reconstruction, making STORM well-suited for real-world applications.

Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Contributions: • We propose STORM, the first feed-forward, self-supervised method for fast and accurate reconstruction of dynamic 3D scenes from sparse, multi-timestep, posed camera images. • Our bottom-up framework aggregates and transforms per-frame 3D Gaussian Splats into a cohesive scene representation, enabling self-supervised motion estimation. Furthermore, we introduce motion tokens that capture common motion primitives and regularize motion predictions, facilitating dynamic motion group segmentation without explicit motion or correspondence supervision. • We present several enhancements for in-the-wild scenarios, including sky modeling, camera exposure inconsistency handling, large novel-view extrapolation, and fine-grained human motions reconstruction, making STORM well-suited for real-world applications.

53,292 просмотров • 1 год назад