Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models Contributions: • We introduce 4D LangSplat for open-vocabulary 4D spatial-temporal queries. To the best of our knowledge, we are the first to construct 4D language fields with object textual captions generated by MLLMs. • To model smooth transitions... show more

MrNeRF

16,745 subscribers

10,953 views • 1 year ago •via X (Twitter)

Education Science & Technology

Anya Rossi• Live Now

Private livecam show

5 Comments

MrNeRF1 year ago

Paper: Project: YouTube: Code:

AssemblyAI1 year ago

Announcing: Our most advanced speech-to-text model goes beyond accuracy to capture the real-world complexity of human conversation and deliver reliable, source-of-truth audio data. Explore Universal-2 updates 👇

GifCo1 year ago

Can think of a lot of cool use cases for this!

MrNeRF1 year ago

Can you share some ideas? I am curious :)

LLMLens1 year ago

4D LangSplat's fusion of spatiotemporal Gaussian splatting with LLMs echoes Simondon's concept of technical individuation. Yet it risks reifying language as mere technical object, divorced from lived experience. How might we preserve the human in this hyper-technical assemblage?

Related Videos

4D LangSplat 4D Language Gaussian Splatting via Multimodal Large Language Models

4D LangSplat 4D Language Gaussian Splatting via Multimodal Large Language Models

AK

30,848 views • 1 year ago

Robot Learning needs 4D world models! Robot Learning needs 4D world models! Robot Learning needs 4D world models! We introduce TesserAct, a 4D embodied world model that can simulate how agents interact with the 3D world over time! We achieve this by simply extending a pre-trained 2D video generation model to jointly predict RGB, depth, and surface normals. It enables: 1️⃣ Much better policy learning in the wild 2️⃣ Temporal + spatial coherence in 4D dynamic prediction 3️⃣ Novel view synthesis for embodied scenes Code: Paper Link: Project page:

Robot Learning needs 4D world models! Robot Learning needs 4D world models! Robot Learning needs 4D world models! We introduce TesserAct, a 4D embodied world model that can simulate how agents interact with the 3D world over time! We achieve this by simply extending a pre-trained 2D video generation model to jointly predict RGB, depth, and surface normals. It enables: 1️⃣ Much better policy learning in the wild 2️⃣ Temporal + spatial coherence in 4D dynamic prediction 3️⃣ Novel view synthesis for embodied scenes Code: Paper Link: Project page:

Chuang Gan

43,265 views • 1 year ago

1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering Contributions: • We delve into the temporal redundancy of 4D Gaussian Splatting and explain the main reason for the storage pressure and suboptimal rendering speed. • We introduce 4DGS-1K, a compact and memory-efficient framework to address these issues. It consists of two key components: a spatial-temporal variation score-based pruning strategy and a temporal filter. • Extensive experiments demonstrate that 4DGS-1K not only achieves a substantial storage reduction of approximately 41× but also accelerates rasterization to 1000+ FPS while maintaining high-quality reconstruction.

1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering Contributions: • We delve into the temporal redundancy of 4D Gaussian Splatting and explain the main reason for the storage pressure and suboptimal rendering speed. • We introduce 4DGS-1K, a compact and memory-efficient framework to address these issues. It consists of two key components: a spatial-temporal variation score-based pruning strategy and a temporal filter. • Extensive experiments demonstrate that 4DGS-1K not only achieves a substantial storage reduction of approximately 41× but also accelerates rasterization to 1000+ FPS while maintaining high-quality reconstruction.

MrNeRF

12,200 views • 1 year ago

We’ve upgraded Stable Video Diffusion 4D to Stable Video 4D 2.0 (SV4D 2.0), improving the quality of 4D outputs generated from a single object-centric video. While 3D provides a static view of an object’s shape and size; 4D extends this by including time, showing how the object moves. This multi-view video diffusion model generates a 4D output in three steps: 1️⃣ Starts with an input video of a moving person or object 2️⃣ Generates novel views of the subject from unseen angles 3️⃣ Constructs a single dynamic 4D video output with spatial and temporal consistency You can learn more here: (1/4)

We’ve upgraded Stable Video Diffusion 4D to Stable Video 4D 2.0 (SV4D 2.0), improving the quality of 4D outputs generated from a single object-centric video. While 3D provides a static view of an object’s shape and size; 4D extends this by including time, showing how the object moves. This multi-view video diffusion model generates a 4D output in three steps: 1️⃣ Starts with an input video of a moving person or object 2️⃣ Generates novel views of the subject from unseen angles 3️⃣ Constructs a single dynamic 4D video output with spatial and temporal consistency You can learn more here: (1/4)

Stability AI

35,974 views • 1 year ago

Meet our inhouse tech Doggo "Rolo". Cinematic RGB Lighting "More Doggo than Doggo" Since July we've been redesigning our scanning pipeline to work with the amazing 3D Gaussian Splatting for Real-Time Radiance Field Rendering method from Inria. IR's AeonX capture system has been designed to capture "Spatial" memories for the future. We've built on a custom version of Inria's Sibr viewer to handle the playback of our multiple processed 4D frames. GS has reignited our passion for 3D/4D scanning. Thank you to the team at Inria More content to come. #guassiansplatting #ir #aeonx #4d #motionscanning #inria #ximea #idatronic #spatial #youth83

Meet our inhouse tech Doggo "Rolo". Cinematic RGB Lighting "More Doggo than Doggo" Since July we've been redesigning our scanning pipeline to work with the amazing 3D Gaussian Splatting for Real-Time Radiance Field Rendering method from Inria. IR's AeonX capture system has been designed to capture "Spatial" memories for the future. We've built on a custom version of Inria's Sibr viewer to handle the playback of our multiple processed 4D frames. GS has reignited our passion for 3D/4D scanning. Thank you to the team at Inria More content to come. #guassiansplatting #ir #aeonx #4d #motionscanning #inria #ximea #idatronic #spatial #youth83

Infinite-Realities

98,664 views • 2 years ago

White Light Reference for Machine Learning. Meet our inhouse tech Doggo "Rolo". "More Doggo than Doggo" Since July we've been redesigning our scanning pipeline to work with the new 3D Gaussian Splatting for Real-Time Radiance Field Rendering method from Inria. IR's AeonX capture system has been designed to capture "Spatial" memories for the future. We've built on a custom version of Inria's Sibr viewer to handle the playback of our multiple processed 4D frames. GS has reignited our passion for 3D/4D scanning. Thank you to the team at Inria More content to come.. #guassiansplatting #ir #aeonx #4d #motionscanning #inria #ximea #idatronic #spatial #youth83

White Light Reference for Machine Learning. Meet our inhouse tech Doggo "Rolo". "More Doggo than Doggo" Since July we've been redesigning our scanning pipeline to work with the new 3D Gaussian Splatting for Real-Time Radiance Field Rendering method from Inria. IR's AeonX capture system has been designed to capture "Spatial" memories for the future. We've built on a custom version of Inria's Sibr viewer to handle the playback of our multiple processed 4D frames. GS has reignited our passion for 3D/4D scanning. Thank you to the team at Inria More content to come.. #guassiansplatting #ir #aeonx #4d #motionscanning #inria #ximea #idatronic #spatial #youth83

Infinite-Realities

236,763 views • 2 years ago

"More Doggo than Doggo" Motion scanned @ 60fps. Playing back @ 20fps. Meet our inhouse tech Doggo "Rolo". Since July we've been redesigning our scanning pipeline to work with the new 3D Gaussian Splatting for Real-Time Radiance Field Rendering method from Inria. GS has reignited our passion for 3D/4D scanning. Thank you to the team at Inria IR's AeonX capture system has been designed to capture "Spatial" memories for the future. We've built on a custom version of Inria's Sibr viewer to handle the playback of our multiple processed 4D frames. Our next goal is to optimize the player to be multi-threaded to improve performance over 60fps. GS has reignited our passion for 3D/4D scanning. Thank you to the team at Inria More content to come. #guassiansplatting #ir #aeonx #4d #motionscanning #inria #ximea #idatronic #spatial #youth83

"More Doggo than Doggo" Motion scanned @ 60fps. Playing back @ 20fps. Meet our inhouse tech Doggo "Rolo". Since July we've been redesigning our scanning pipeline to work with the new 3D Gaussian Splatting for Real-Time Radiance Field Rendering method from Inria. GS has reignited our passion for 3D/4D scanning. Thank you to the team at Inria IR's AeonX capture system has been designed to capture "Spatial" memories for the future. We've built on a custom version of Inria's Sibr viewer to handle the playback of our multiple processed 4D frames. Our next goal is to optimize the player to be multi-threaded to improve performance over 60fps. GS has reignited our passion for 3D/4D scanning. Thank you to the team at Inria More content to come. #guassiansplatting #ir #aeonx #4d #motionscanning #inria #ximea #idatronic #spatial #youth83

Infinite-Realities

48,300 views • 2 years ago

Long live bullet time. The future of sports is 4d gaussian splatting.

Long live bullet time. The future of sports is 4d gaussian splatting.

Bilawal Sidhu

127,835 views • 2 months ago

LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS Contributions: • LangSplatV2 achieves real-time performance with 476.2 FPS for high-dimensional feature splatting and 384.6 FPS for 3D open-vocabulary text querying. • Delivers a 42× speedup and 47× boost in performance compared to LangSplat. • Improves query accuracy while drastically reducing inference time. • Replaces the heavyweight decoder in LangSplat with a sparse coefficient field, removing the main performance bottleneck. • Introduces a CUDA-optimized sparse coefficient splatting method, enabling fast and high-quality rendering of high-dimensional features. • Enables scalable 3D language interaction in complex scenes, opening up real-time applications previously not possible with LangSplat.

LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS Contributions: • LangSplatV2 achieves real-time performance with 476.2 FPS for high-dimensional feature splatting and 384.6 FPS for 3D open-vocabulary text querying. • Delivers a 42× speedup and 47× boost in performance compared to LangSplat. • Improves query accuracy while drastically reducing inference time. • Replaces the heavyweight decoder in LangSplat with a sparse coefficient field, removing the main performance bottleneck. • Introduces a CUDA-optimized sparse coefficient splatting method, enabling fast and high-quality rendering of high-dimensional features. • Enables scalable 3D language interaction in complex scenes, opening up real-time applications previously not possible with LangSplat.

MrNeRF

14,261 views • 11 months ago

a one minute explanation of 4d gaussian splatting

a one minute explanation of 4d gaussian splatting

dylan

186,592 views • 2 years ago

Interactive 4D Time-Lapse for #VDC in #Construction / #AEC Gauzilla Pro allows for creating a 4D time-lapse from a series of videos. You can convert drone footage into an interactive historical record in 4D running 100% on the web. ✅ 4D time-lapse drastically reduces the need for physical site visits and also enhances communications among teams and stakeholders.

Interactive 4D Time-Lapse for #VDC in #Construction / #AEC Gauzilla Pro allows for creating a 4D time-lapse from a series of videos. You can convert drone footage into an interactive historical record in 4D running 100% on the web. ✅ 4D time-lapse drastically reduces the need for physical site visits and also enhances communications among teams and stakeholders.

Gauzilla Pro

13,108 views • 5 days ago

4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos Abstract: We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction, trained entirely on real-world monocular posed videos. Using 4D Gaussian as an inductive bias, 4DGT unifies static and dynamic components, enabling the modeling of complex, time-varying environments with varying object lifespans. We introduced a novel density control strategy in training, which allows our 4DGT to handle longer space-time input while maintaining efficient rendering at runtime. Our model processes 64 consecutive posed frames in a rolling-window fashion, predicting consistent 4D Gaussians in the scene. Unlike optimization-based methods, 4DGT performs purely feed-forward inference, reducing reconstruction time from hours to seconds and scaling effectively to long video sequences. Trained only on large-scale monocular posed video datasets, 4DGT can significantly outperform prior Gaussian-based networks in real-world videos and achieve on-par accuracy with optimization-based methods on cross-domain videos.

4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos Abstract: We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction, trained entirely on real-world monocular posed videos. Using 4D Gaussian as an inductive bias, 4DGT unifies static and dynamic components, enabling the modeling of complex, time-varying environments with varying object lifespans. We introduced a novel density control strategy in training, which allows our 4DGT to handle longer space-time input while maintaining efficient rendering at runtime. Our model processes 64 consecutive posed frames in a rolling-window fashion, predicting consistent 4D Gaussians in the scene. Unlike optimization-based methods, 4DGT performs purely feed-forward inference, reducing reconstruction time from hours to seconds and scaling effectively to long video sequences. Trained only on large-scale monocular posed video datasets, 4DGT can significantly outperform prior Gaussian-based networks in real-world videos and achieve on-par accuracy with optimization-based methods on cross-domain videos.

MrNeRF

34,782 views • 1 year ago

So beyond proud to announce we produced the soundtrack for Bubsy 4D! Here’s the Main Theme in its entirety, big thanks to Fabraz: Demon Tides & Bubsy 4D and Atari for trusting us with this project. Enjoy!

So beyond proud to announce we produced the soundtrack for Bubsy 4D! Here’s the Main Theme in its entirety, big thanks to Fabraz: Demon Tides & Bubsy 4D and Atari for trusting us with this project. Enjoy!

Fat Bard

62,111 views • 10 months ago

Zhou et al., "PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception" VGGT extended to dynamic scenes with a dynamic mask predictor.

Zhou et al., "PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception" VGGT extended to dynamic scenes with a dynamic mask predictor.

Kwang Moo Yi

11,546 views • 7 months ago

HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting Contributions: First, we propose Homogeneous Gaussian Splatting (HoGS), a novel method adopting homogeneous coordinates to represent positions and scales of 3DGS for realistic and real-time rendering of both near and far objects. Second, despite the ultimate simplicity of HoGS, our method achieves state-of-the-art NVS results compared to other implicit and explicit representations.

HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting Contributions: First, we propose Homogeneous Gaussian Splatting (HoGS), a novel method adopting homogeneous coordinates to represent positions and scales of 3DGS for realistic and real-time rendering of both near and far objects. Second, despite the ultimate simplicity of HoGS, our method achieves state-of-the-art NVS results compared to other implicit and explicit representations.

MrNeRF

22,978 views • 1 year ago

4D Gaussian Splatting on the web just leveled up. ✨ Announcing the integration of Gracia with the open-source @PlayCanvas Engine. 🧵

4D Gaussian Splatting on the web just leveled up. ✨ Announcing the integration of Gracia with the open-source @PlayCanvas Engine. 🧵

Will Eastcott

151,062 views • 2 months ago

Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any time to any view at any other time? Introducing 4D-LRM: a Large Space-Time Reconstruction Model that ... 🔹 Predicts 4D Gaussian primitives directly from multi-view tokens (no motion vectors, no HexPlane); 🔹 Uses a clean, minimal Transformer backbone; 🔹 Generalizes with fast, high-quality feedforward rendering at any view and infinite frame rate. Check out more interactive demos and scaling behaviors on our homepage/paper. 👉Website: 👉Paper:

Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any time to any view at any other time? Introducing 4D-LRM: a Large Space-Time Reconstruction Model that ... 🔹 Predicts 4D Gaussian primitives directly from multi-view tokens (no motion vectors, no HexPlane); 🔹 Uses a clean, minimal Transformer backbone; 🔹 Generalizes with fast, high-quality feedforward rendering at any view and infinite frame rate. Check out more interactive demos and scaling behaviors on our homepage/paper. 👉Website: 👉Paper:

Martin Ziqiao Ma

21,787 views • 1 year ago

Free4D announced on Hugging Face Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

Free4D announced on Hugging Face Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

AK

22,878 views • 1 year ago

Check out this BIM 4D sim by Thiery Peleias, owner of Virtuart4d 🚧 Imported into UE5 for the animation aspect, the project model was created in Revit, the 4D planning animation in Synchro 4D Pro and the environment captured from Cesium with Google Maps Photorealistic 3D Tiles!

Check out this BIM 4D sim by Thiery Peleias, owner of Virtuart4d 🚧 Imported into UE5 for the animation aspect, the project model was created in Revit, the 4D planning animation in Synchro 4D Pro and the environment captured from Cesium with Google Maps Photorealistic 3D Tiles!

Unreal Engine

20,772 views • 1 year ago