๐ขPix2NPHM: Learning to Regress NPHM Reconstructions From a Single... Image๐ข We directly regress neural parametric head models (NPHMs) from a single image โ fast, stable, and significantly more expressive than classical 3DMMs such as FLAME. Face tracking & 3D reconstruction are often limited by the representational capacity of PCA-based face models. By lifting NPHMs to a first-class reconstruction primitive, we enable more accurate geometry, richer expressions, and finer animation control. Pix2NPHM obtains fast and reliable NPHM reconstructions on real-world data. Inference-time optimization against surface normals and canonical point maps can further increase fidelity. Key to successful and generalized training of our ViT-based network are: (1) large-scale registration of existing 3D head datasets, and (2) self-supervised training on vast in-the-wild 2D video datasets using pseudo ground-truth surface normals. Finally, we show that geometry-aware pretraining on pixel-aligned reconstruction tasks significantly outperforms generic visual pretraining (e.g., DINO-style features) in terms of generalization. ๐ ๐ฅ Great work by Simon Giebenhain, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Zhe Chenshow more

Matthias Niessner
37,807 views โข 6 months ago
๐ขGeomHair: Reconstruction of Hair Strands from Colorless 3D Scans๐ข... We present a novel method to reconstruct hair strands from colorless 3D scans by extracting orientation cues directly from the mesh surface geometry by finding local characteristic lines and from shaded renderings using a neural 2D line detector. We enhance the reconstruction with a diffusion prior trained on synthetic hair data and adapted to each scan using a tailored text prompt, allowing us to recover both simple and complex hairstyles without relying on color input. To support further research, we also introduce Strands400, the largest publicly available dataset of 3D hair strand reconstructions from real-world scans of 400 different people, featuring complicated hairstyles, such as ponytails and buns. ๐ ๐ท Great work by Rachmadio Noval L. Artem Sevastopolsky Egor Zakharov @ness_prisshow more

Matthias Niessner
12,466 views โข 1 year ago
F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent Gaussian... Splatting Contributions: โข We pioneer 3D-aware generation using generalizable feed-forward Gaussian Splatting representation, achieving significant efficiency and favorable rendering quality on monocular datasets. โข We significantly advance the capability of pixel-aligned Gaussian Splatting representations by designing a self-supervised cycle training strategy specifically tailored for monocular datasets. โข We further mitigate the artifacts of 3D-aware representations caused by large viewpoint shifts by introducing geometry-aware video priors.show more

MrNeRF
14,229 views โข 1 year ago
Wonderland: Navigating 3D Scenes from a Single Image Contributions:... โข First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. โข Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. โข Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasksโand bridging image space and 3D spaceโthrough the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256ร spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.show more

MrNeRF
52,801 views โข 1 year ago
Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic... Reconstruction Contributions: โข A method for efficiently reconstructing dynamic surfaces from multi-view videos using Gaussian surfels. โข A unified and gradient-aware densification strategy for optimizing dynamic 3D Gaussians with fine details. โข A temporal consistency approach that ensures stable and coherent surface reconstructions across frames by enforcing consistency on curvature maps. โข Extensive experiments that demonstrate our methodโs advantages including fast training, high-fidelity novel view synthesis, and accurate surface geometry.show more

MrNeRF
31,821 views โข 1 year ago
Human Hair Reconstruction with Strand-Aligned 3D Gaussians Contributions (cited):... โ We propose a new 3D line lifting scheme that uses a modified 3DGS reconstruction technique to lift 2D orientation maps into a 3D field while also providing refinement of the camera parameters; โ We introduce a dual representation of hair strand polylines and 3D Gaussians to achieve differentiable rasterization of hair strands and leverage photometric constraints for strand-based hair reconstruction; โ Based on these components, we propose a coarse-to-fine optimization method for prior-guided hair reconstruction that leverages both latent and explicit representations of the hairstyle.show more

MrNeRF
106,521 views โข 1 year ago
๐ขAnnouncing our 3D head avatar benchmark๐ข Two tasks with... hidden test sets: - Dynamic Novel View Synthesis on Heads - Monocular FLAME-driven Head Avatar Reconstruction Our goal is to make research on 3D head avatars more comparable and ultimately increase the realism of digital humans. The benchmark studies distinct phenomena of 3D head avatar creation, such as extreme facial expressions, slow motion captures of shaking long hair, or complicated light reflection and refraction patterns of glasses. The two benchmark tasks assess two core desiderata of 3D avatars: While the novel view synthesis challenge focuses on best possible rendering quality of complex moving scenes, the avatar animation challenge is concerned with how well a driving signal is translated into an avatar. Evaluations are light-weight and consist of diverse video recordings from the popular NeRSemble dataset with a hidden test set. Participation in the benchmark is therefore straight-forward and requires only 5 reconstructions per task. Leaderboard and benchmark submission: Benchmark data access and toolkit: Great work by Tobias Kirschstein Simon Giebenhainshow more

Matthias Niessner
28,062 views โข 1 year ago
Meta releases VGGSfM Visual Geometry Grounded Deep Structure From... Motion Structure-from-motion (SfM) is a long-standing problem in the computer vision community, which aims to reconstruct the camera poses and 3D structure of a scene from a set of unconstrained 2D images. Classical frameworks solve this problem in an incremental manner by detecting and matching keypoints, registering images, triangulating 3D points, and conducting bundle adjustment. Recent research efforts have predominantly revolved around harnessing the power of deep learning techniques to enhance specific elements (e.g., keypoint matching), but are still based on the original, non-differentiable pipeline. Instead, we propose a new deep SfM pipeline VGGSfM, where each component is fully differentiable and thus can be trained in an end-to-end manner. To this end, we introduce new mechanisms and simplifications. First, we build on recent advances in deep 2D point tracking to extract reliable pixel-accurate tracks, which eliminates the need for chaining pairwise matches. Furthermore, we recover all cameras simultaneously based on the image and track features instead of gradually registering cameras. Finally, we optimise the cameras and triangulate 3D points via a differentiable bundle adjustment layer. We attain state-of-the-art performance on three popular datasets, CO3D, IMC Phototourism, and ETH3D.show more

AK
96,527 views โข 1 year ago
We are excited to introduce Stable Fast 3D, Stability... AIโs latest breakthrough in 3D asset generation technology. This innovative model transforms a single input image into a detailed 3D asset in just 0.5 seconds, setting a new standard for speed and quality in the field of 3D reconstruction! Alongside this release, weโve also published a technical report that highlights how we achieve fast inference speeds with reduced baked illumination and material parameters. ๐พYou can learn more and access the report here:show more

Stability AI
438,350 views โข 1 year ago
(1/2) We released our Neural Parametric Head Models (NPHM)... dataset from our #CVPR2023 paper! It includes over 5600 high-fidelity 3D scans of human heads from 272 subjects - all publicly available! Check it out!show more

Matthias Niessner
36,014 views โข 3 years ago
The term "continual learning" has become overloaded if you... see it as an ML problem. One classic thread is about memorization: regularization-based continual learning methods, such as EWC, MAS, and SI, estimate which parameters mattered for previous tasks and resist changing them too much. One modern thread is about adaptation: test-time training and inference-time learning methods, such as TTT, adapt part of the model on the incoming test stream before making predictions. These are sometimes discussed as separate threads. But in modern scalable architectures, I think they are better seen as complementary constraints: a model that learns quickly at test time also benefits from a mechanism for deciding what not to forget. In our #ECCV2026 paper, we study this in large-scale 4D reconstruction: how to build fast spatial memory that can adapt over long observation streams while reducing collapse and forgetting. Instead of using fully plastic test-time updates, we stabilize fast-weight adaptation with an elastic prior that balances adaptation and memory. Key ideas: - Elastic Test-Time Training: Fisher-weighted consolidation for fast-weight updates - EMA anchor weights that provide a moving reference for stability - Chunk-by-chunk inference for long 3D/4D observation streams We show that this scales across large 3D/4D pretraining settings, including both LRM-style and LVSM-style models, and improves reconstruction across benchmarks including Stereo4D, NVIDIA, and DL3DV-140. We release model checkpoints across different design choices: resolution, post-training curriculum, and whether the model uses an explicit 4DGS intermediate representation. - Homepage: - Paper: - Code: - Models: This work is co-led with Xueyang Yu, contributed by Haoyu Zhen Yuncong Yang, and advised by Michigan SLED Lab Chuang Gan.show more

Martin Ziqiao Ma
31,958 views โข 11 days ago
Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction... Note: Check below for full video. Abstract (cited): "In this paper, we present a self-calibrating framework that jointly optimizes camera parameters, lens distortion, and 3D Gaussian representations, enabling accurate and efficient scene reconstruction. Our technique is particularly effective for high-quality scene reconstruction from large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of images. We introduce a novel method for modeling complex lens distortions using a hybrid network that combines invertible residual networks with explicit grids. This design effectively regularizes the optimization process, achieving greater accuracy than conventional camera models. Additionally, we propose a cubemap-based resampling strategy to support large FOV images without sacrificing resolution or introducing distortion artifacts. Our method is compatible with the fast rasterization of Gaussian Splatting, adaptable to a wide variety of camera lens distortions, and demonstrates state-of-the-art performance on both synthetic and real-world datasets."show more

MrNeRF
17,206 views โข 1 year ago
CVPR Highlight #1 VGGT: Visual Geometry Grounded Transformer (best... paper award) ๐ Super-fast 3D scene reconstruction ๐ค Model and demo on Hugging Face The result is a GLB point cloud ๐show more

dylan
19,325 views โข 1 year ago
EDGS: Eliminating Densification for Efficient Convergence of 3DGS Contributions:... โข We show that initial triangulation based on 2D correspondences can replace the incremental refinement process, fundamentally changing how 3DGS models allocate resources. โข Our method reduces the path each Gaussian must travel in parameter space. Careful initialization not only accelerates convergence but also guides optimization toward a convergence point corresponding to lower reconstruction error and thus higher reconstruction quality. โข Our approach outperforms both speed-optimized and quality-focused state-of-the-art models while using only half the splats of standard 3DGS. By improving initialization rather than altering the optimization process, this method is compatible with other 3DGS acceleration techniques, making it a flexible enhancement to existing models.show more

MrNeRF
124,046 views โข 1 year ago
๐ข๐ข๐ข NoKSR: Kernel-Free Neural Surface Reconstruction via Point Cloud... Serialization #3DV2025 TL;DR: neural reconstruction with a simpler architecture (no linear systems to solve), and up to 3x speedup vs. voxel-based methods!show more

Andrea Tagliasacchi @CVPR
16,962 views โข 1 year ago
DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from... In-the-Wild Drone Imagery Abstract: Drones have become essential tools for reconstructing wild scenes due to their outstanding maneuverability. Recent advances in radiance field methods have achieved remarkable rendering quality, providing a new avenue for 3D reconstruction from drone imagery. However, dynamic distractors in wild environments challenge the static scene assumption in radiance fields, while limited view constraints hinder the accurate capture of underlying scene geometry. To address these challenges, we introduce DroneSplat, a novel framework designed for robust 3D reconstruction from in-the-wild drone imagery. Our method adaptively adjusts masking thresholds by integrating local-global segmentation heuristics with statistical approaches, enabling precise identification and elimination of dynamic distractors in static scenes. We enhance 3D Gaussian Splatting with multi-view stereo predictions and a voxel-guided optimization strategy, supporting high-quality rendering under limited view constraints. For comprehensive evaluation, we provide a drone-captured 3D reconstruction dataset encompassing both dynamic and static scenes. Extensive experiments demonstrate that DroneSplat outperforms both 3DGS and NeRF baselines in handling in-the-wild drone imagery.show more

MrNeRF
21,346 views โข 1 year ago
Create a 3D model from a single image, set... of images or a text prompt in < 1 minute ๐ฎโ๐จ This new AI paper called CAT3D shows us that itโll keep getting easier to produce 3D models from 2D images โ whether itโs a sparser real world 3D scan (a few photos instead of hundreds) or your favorite 2D image generator like Midjourney (just an image). How does this magic work? โThis architecture is similar to video diffusion models, but with camera pose embeddings for each image instead of time embeddings. The generated views are passed into a robust 3D reconstruction pipeline to create the 3D representation (Zip-NeRF or 3DGS)โshow more

Bilawal Sidhu
92,792 views โข 2 years ago
SPARK can create high-quality 3D face avatars from regular... videos and track expressions and poses in real time. It improves the accuracy of 3D face reconstructions for tasks like aging, face swapping, and digital makeup. 6 examples:show more

Dreaming Tulpa ๐ฅ๐
193,764 views โข 1 year ago
Hi3DGen ๐ฅ High-fidelity 3D geometry generation from a single... image by leveraging normal maps as an intermediate representationshow more

Gradio
50,996 views โข 1 year ago
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers paper... page: Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control. State-of-the-art approaches predominantly rely on diffusion models to accomplish these tasks. However, the computational demands of diffusion-based methods are substantial, often necessitating large-scale paired datasets for training, and therefore challenging the deployment in practical applications. This study addresses this challenge by breaking down the text-based video editing process into two separate stages. In the first stage, we leverage an existing text-to-image diffusion model to simultaneously edit a few keyframes without additional fine-tuning. In the second stage, we introduce an efficient model called MaskINT, which is built on non-autoregressive masked generative transformers and specializes in frame interpolation between the keyframes, benefiting from structural guidance provided by intermediate frames. Our comprehensive set of experiments illustrates the efficacy and efficiency of MaskINT when compared to other diffusion-based methodologies. This research offers a practical solution for text-based video editing and showcases the potential of non-autoregressive masked generative transformers in this domain.show more

AK
25,449 views โข 2 years ago
๐ข๐ข๐๐๐๐๐๐ฆ๐๐ฅ๐ ๐ฏ๐ ๐๐๐ญ๐๐ฌ๐๐ญ ๐๐๐ฅ๐๐๐ฌ๐๐ข๐ข Head captures of 7.1MP from... 16 cameras at 73fps: * More recordings (425 people) * Better color calibration * Convenient download scripts The new version of our dataset adds 156 participants for a total of 425 different people. In its entirety, the dataset provides now 65 million images from over 15 hours of diverse human facial expression performances. We improved the color consistency of the recorded images with a better color calibration procedure. As a result, 3D reconstructions with images from the NeRSemble dataset should become better and look more realistic. Finally, we made it much easier to download the recordings with our new download repository. It now just takes a single command to download all frontal hair shake videos of all participants or to download all recordings of a single participant. Check it out: Awesome work by Tobias Kirschstein Simon Giebenhain !!!show more

Matthias Niessner
12,089 views โข 1 year ago