Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

MeshLRM Large Reconstruction Model for High-Quality Mesh We propose MeshLRM, a novel LRM-based approach that can reconstruct a high-quality mesh from merely four input images in less than one second. Different from previous large reconstruction models (LRMs) that focus on

AK

467,282 subscribers

69,111 views • 2 years ago •via X (Twitter)

Science & Technology

Anya Rossi• Live Now

Private livecam show

5 Comments

AK2 years ago

NeRF-based reconstruction, MeshLRM incorporates differentiable mesh extraction and rendering within the LRM framework. This allows for end-to-end mesh reconstruction by fine-tuning a pre-trained NeRF LRM with mesh rendering. Moreover, we improve the LRM architecture by

AK2 years ago

simplifying several complex designs in previous LRMs. MeshLRM's NeRF initialization is sequentially trained with low- and high-resolution images; this new LRM training strategy enables significantly faster convergence and thereby leads to better quality with less compute.

AK2 years ago

Our approach achieves state-of-the-art mesh reconstruction from sparse-view inputs and also allows for many downstream applications, including text-to-3D and single-image-to-3D generation.

AK2 years ago

paper page:

AK2 years ago

daily papers:

Related Videos

Meta researchers built a "large reconstruction model (LRM)" that can generate an animatable photorealistic avatar head in minutes from just four selfies. Details here:

UploadVR

23,589 views • 1 year ago

Current 3D generative models are slow and low quality. We present GRM, a large-scale model that reconstructs 3D Gaussians in 0.1s and generates high-quality 3D assets from text or single images in a few seconds. Demo: 1/4

Current 3D generative models are slow and low quality. We present GRM, a large-scale model that reconstructs 3D Gaussians in 0.1s and generates high-quality 3D assets from text or single images in a few seconds. Demo: 1/4

Gordon Wetzstein

19,189 views • 2 years ago

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: Code and Models: Paper:

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: Code and Models: Paper:

Sherwin Bahmani

66,417 views • 8 months ago

3D Gaussian Splats are cool, but they're static (Part 25). DG-Mesh can reconstruct high-quality and time-consistent 3D meshes from a single video and track mesh vertices over time, which enables texture editing on dynamic objects.

3D Gaussian Splats are cool, but they're static (Part 25). DG-Mesh can reconstruct high-quality and time-consistent 3D meshes from a single video and track mesh vertices over time, which enables texture editing on dynamic objects.

Dreaming Tulpa 🥓👑

79,033 views • 2 years ago

Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any time to any view at any other time? Introducing 4D-LRM: a Large Space-Time Reconstruction Model that ... 🔹 Predicts 4D Gaussian primitives directly from multi-view tokens (no motion vectors, no HexPlane); 🔹 Uses a clean, minimal Transformer backbone; 🔹 Generalizes with fast, high-quality feedforward rendering at any view and infinite frame rate. Check out more interactive demos and scaling behaviors on our homepage/paper. 👉Website: 👉Paper:

Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any time to any view at any other time? Introducing 4D-LRM: a Large Space-Time Reconstruction Model that ... 🔹 Predicts 4D Gaussian primitives directly from multi-view tokens (no motion vectors, no HexPlane); 🔹 Uses a clean, minimal Transformer backbone; 🔹 Generalizes with fast, high-quality feedforward rendering at any view and infinite frame rate. Check out more interactive demos and scaling behaviors on our homepage/paper. 👉Website: 👉Paper:

Martin Ziqiao Ma

21,787 views • 11 months ago

SIFU Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction Creating high-quality 3D models of clothed humans from single images for real-world applications is crucial.

SIFU Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction Creating high-quality 3D models of clothed humans from single images for real-world applications is crucial.

AK

49,506 views • 2 years ago

less than 24 hours ago, tencent released HunyuanCustom! a new high-quality video model that supports video generation using reference images

less than 24 hours ago, tencent released HunyuanCustom! a new high-quality video model that supports video generation using reference images

Dreaming Tulpa 🥓👑

49,440 views • 1 year ago

[SIGGRAPH 2025] Photoreal Scene Reconstruction from an Egocentric Device Contributions: 1. We address the importance of employing visual-inertial bundle adjustment (VIBA) that accounts for the rolling-shutter behavior of the RGB camera. This provides a continuous camera trajectory to model pixel movement in neural reconstruction. Our experiments demonstrate that using VIBA consistently improves the novel view quality in Gaussian Splatting by +1 dB in PSNR. 2. We introduce a rasterization-based image formulation pipeline that addresses common artifacts in physical image formation, including rolling shutter, lens shading, exposure, and gain compensation. Our approach is distinct in that we represent image poses as posed pixel arrays sampled from a continuous trajectory, rather than assigning a single camera pose per image, and preserve the merit of Gaussian rasterization. Unlike existing methods that require ray-tracing Gaussians, e.g., [Moenne-Loccoz et al. 2024], our formulation is applicable to general-purpose rasterization-based Gaussian splatting. When applied to 3D Gaussian Splatting (3DGS) [Kerbl et al. 2023], our approach can further enhance reconstruction quality by +1 dB. We outperform existing baselines and demonstrate a substantial quality improvement in handling complex scenes observed by egocentric devices. 3. To reduce the effect of blur from rapid head motion in darker indoor scenes, we propose a strategy of deliberately underexposing input videos during capture, inspired by HDR+ [Hasinoff et al. 2016]. We demonstrate that we can reconstruct high-quality, noise-free scene radiance from noisy, dim input videos, and further render sharp, blur-free videos at a higher dynamic range.

[SIGGRAPH 2025] Photoreal Scene Reconstruction from an Egocentric Device Contributions: 1. We address the importance of employing visual-inertial bundle adjustment (VIBA) that accounts for the rolling-shutter behavior of the RGB camera. This provides a continuous camera trajectory to model pixel movement in neural reconstruction. Our experiments demonstrate that using VIBA consistently improves the novel view quality in Gaussian Splatting by +1 dB in PSNR. 2. We introduce a rasterization-based image formulation pipeline that addresses common artifacts in physical image formation, including rolling shutter, lens shading, exposure, and gain compensation. Our approach is distinct in that we represent image poses as posed pixel arrays sampled from a continuous trajectory, rather than assigning a single camera pose per image, and preserve the merit of Gaussian rasterization. Unlike existing methods that require ray-tracing Gaussians, e.g., [Moenne-Loccoz et al. 2024], our formulation is applicable to general-purpose rasterization-based Gaussian splatting. When applied to 3D Gaussian Splatting (3DGS) [Kerbl et al. 2023], our approach can further enhance reconstruction quality by +1 dB. We outperform existing baselines and demonstrate a substantial quality improvement in handling complex scenes observed by egocentric devices. 3. To reduce the effect of blur from rapid head motion in darker indoor scenes, we propose a strategy of deliberately underexposing input videos during capture, inspired by HDR+ [Hasinoff et al. 2016]. We demonstrate that we can reconstruct high-quality, noise-free scene radiance from noisy, dim input videos, and further render sharp, blur-free videos at a higher dynamic range.

MrNeRF

15,244 views • 1 year ago

Nvidia presents Edify 3D! The method can generate high-quality 3D assets from text descriptions. It uses a diffusion model to create detailed quad-mesh topologies and high-resolution textures in under 2 minutes.

Nvidia presents Edify 3D! The method can generate high-quality 3D assets from text descriptions. It uses a diffusion model to create detailed quad-mesh topologies and high-resolution textures in under 2 minutes.

Dreaming Tulpa 🥓👑

39,338 views • 1 year ago

[SIGGRAPH '25] TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling Note: On the left that's a 3DGS rendering! Contributions: 1. We propose a simple approach for rigging 3D Gaussians within the continuous tangent space of 3DMM face models, allowing Gaussians to move freely across mesh triangles. 2. We propose a novel CNN-based deformation model that is agnostic to the number of 3D Gaussians, naturally enabling adaptively densification of the representation to improve detail where most needed, with expression-dependent shading. 3. We show significant improvements over baseline SOTA methods and demonstrate the ability to render even extreme close-up images at high quality.

[SIGGRAPH '25] TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling Note: On the left that's a 3DGS rendering! Contributions: 1. We propose a simple approach for rigging 3D Gaussians within the continuous tangent space of 3DMM face models, allowing Gaussians to move freely across mesh triangles. 2. We propose a novel CNN-based deformation model that is agnostic to the number of 3D Gaussians, naturally enabling adaptively densification of the representation to improve detail where most needed, with expression-dependent shading. 3. We show significant improvements over baseline SOTA methods and demonstrate the ability to render even extreme close-up images at high quality.

MrNeRF

29,010 views • 1 year ago

OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering Contributions: • We propose an occlusion-aware scene division strategy that considers the scene layout and camera co-visibilities. The resulting regions barely contain occlusions, and the corresponding training cameras have a higher average contribution, leading to improved reconstruction results. • We present a region-based rendering technique that accelerates 3D Gaussian splatting in large scenes. It eliminates much of the time-consuming processing of invisible 3D Gaussians, boosting rendering speeds without noticeable quality degradation. • We conduct extensive experiments on several large-scene datasets and demonstrate that OccluGaussian achieves superior rendering quality and faster rendering speed compared to previous state-of-the-art methods.

OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering Contributions: • We propose an occlusion-aware scene division strategy that considers the scene layout and camera co-visibilities. The resulting regions barely contain occlusions, and the corresponding training cameras have a higher average contribution, leading to improved reconstruction results. • We present a region-based rendering technique that accelerates 3D Gaussian splatting in large scenes. It eliminates much of the time-consuming processing of invisible 3D Gaussians, boosting rendering speeds without noticeable quality degradation. • We conduct extensive experiments on several large-scene datasets and demonstrate that OccluGaussian achieves superior rendering quality and faster rendering speed compared to previous state-of-the-art methods.

MrNeRF

10,718 views • 1 year ago

Alibaba just released LHM on Hugging Face Large Animatable Human Reconstruction Model from a Single Image in Seconds

Alibaba just released LHM on Hugging Face Large Animatable Human Reconstruction Model from a Single Image in Seconds

AK

170,372 views • 1 year ago

Excited to share RoCo: Dialectic Multi-Robot Collaboration with Large Language Models. We propose a novel approach to multi-robot collaboration that leverages LLMs for both high-level communication and low-level path planning. w/ Shreeya Jain, Shuran Song

Excited to share RoCo: Dialectic Multi-Robot Collaboration with Large Language Models. We propose a novel approach to multi-robot collaboration that leverages LLMs for both high-level communication and low-level path planning. w/ Shreeya Jain, Shuran Song

Mandi Zhao

88,704 views • 2 years ago

This is how high quality metal mesh is made in India.

This is how high quality metal mesh is made in India.

Jean Claude NIYOMUGABO

152,972 views • 7 months ago

Synthesizing worlds with video diffusion models is often inconsistent — moving the camera back and forth leads to different scenes. We propose 🌐𝗪𝗼𝗿𝗹𝗱𝗠𝗲𝗺, a memory-based approach that ensures consistent world simulation without relying on explicit 3D reconstruction.

Synthesizing worlds with video diffusion models is often inconsistent — moving the camera back and forth leads to different scenes. We propose 🌐𝗪𝗼𝗿𝗹𝗱𝗠𝗲𝗺, a memory-based approach that ensures consistent world simulation without relying on explicit 3D reconstruction.

Xingang Pan

19,413 views • 1 year ago

We are thrilled to release the next leap in art-grade 3D generative models. Our single-click model pipeline gives unprecedented mesh outputs, with mesh parts-based topology. It is available now for all Cube tiers to start for free. ☑️ Our multi-stage hierarchical AI models produce a fully assembled 3D mesh with adaptive poly-counts, providing the clean, separated topology you need. ✅ A parts-based approach enables high-resolution meshes. 🌐 Quad and Triangle mesh support. Access now:

We are thrilled to release the next leap in art-grade 3D generative models. Our single-click model pipeline gives unprecedented mesh outputs, with mesh parts-based topology. It is available now for all Cube tiers to start for free. ☑️ Our multi-stage hierarchical AI models produce a fully assembled 3D mesh with adaptive poly-counts, providing the clean, separated topology you need. ✅ A parts-based approach enables high-resolution meshes. 🌐 Quad and Triangle mesh support. Access now:

Common Sense Machines

420,323 views • 11 months ago

"YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting" TL;DR: a unified 3D Gaussian splatting model that reconstructs high-quality scene geometry and camera poses from unposed/uncalibrated images in a single forward pass.

"YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting" TL;DR: a unified 3D Gaussian splatting model that reconstructs high-quality scene geometry and camera poses from unposed/uncalibrated images in a single forward pass.

Alexandre Morgand

14,839 views • 3 months ago

📢📢 𝐀𝐯𝐚𝐭𝟑𝐫 📢📢 Avat3r creates high-quality 3D head avatars from just a few input images in a single forward pass with a new dynamic 3DGS reconstruction model. Video: Project: Our core idea is to make Gaussian Reconstruction Models animatable. We find that a simple cross-attention to an expression code sequence is already sufficient to model complex facial expressions. We then incorporate position maps from DUSt3R and feature maps from Sapiens to facilitate the prediction task. While DUSt3R's position maps act as a pixel-aligned initialization for the Gaussians' positions, the Sapiens feature maps help the cross-view transformer to match corresponding image tokens in the 4 input images. One major challenge in creating a 3D head avatar from smartphone images comes from inconsistent facial expressions when the subject could not remain perfectly static during the capture. We eliminate this static requirement by simply showing our model input images with different facial expressions during training. This technique makes our model robust to inconsistent input images later on. Finally, we show that despite the model has been trained with 4 input images, one can even create a 3D head avatar when only a single image is available. To achieve this, we employ a pre-trained 3D GAN to lift the single image to 3D and then render the 4 input images for our model. This allows us to create 3D head avatars from single images and even highly out-of-distribution examples like AI generated faces, paintings or statues. Great work by Tobias Kirschstein from his internship at Meta with Javier Romero, Artem Sevastopolsky, and Shunsuke Saito

Matthias Niessner

74,698 views • 1 year ago

Meet Gemini Omni, our new model that can create anything from any input, starting with video. With Gemini Omni, you can combine images, videos and text as inputs and generate high-quality videos grounded in Gemini's real-world knowledge. #GoogleIO

Meet Gemini Omni, our new model that can create anything from any input, starting with video. With Gemini Omni, you can combine images, videos and text as inputs and generate high-quality videos grounded in Gemini's real-world knowledge. #GoogleIO

Google Gemini

88,240 views • 1 month ago

Meet Gemini Omni, our new model that can create anything from any input, starting with video. With Gemini Omni, you can combine images, videos and text as inputs and generate high-quality videos grounded in Gemini's real-world knowledge.

Meet Gemini Omni, our new model that can create anything from any input, starting with video. With Gemini Omni, you can combine images, videos and text as inputs and generate high-quality videos grounded in Gemini's real-world knowledge.

Google Gemini

32,779,480 views • 20 days ago