Matthias Niessner's banner
Matthias Niessner's profile picture

Matthias Niessner

@MattNiessner47,338 subscribers

Professor for Visual Computing & Artificial Intelligence @TU_Muenchen Co-Founder @synthesiaIO Co-Founder @SpAItial_AI

Shorts

(1/2) Check out 𝐌𝐞𝐬𝐡𝐆𝐏𝐓! MeshGPT generates triangle meshes by autoregressively sampling from a transformer model that produces tokens from a learned geometric vocabulary. As a result, we obtain clean and compact meshes :)

(1/2) Check out 𝐌𝐞𝐬𝐡𝐆𝐏𝐓! MeshGPT generates triangle meshes by autoregressively sampling from a transformer model that produces tokens from a learned geometric vocabulary. As a result, we obtain clean and compact meshes :)

395,858 Aufrufe

Many 3D generators output Gaussian Splats (3DGS) for fast rendering, flexible deployment, and high visual fidelity. Static 3DGS aren't world models (no dynamics/semantics) but a true world model must allow distilling 3D-consistent representations for any given time step (3DGS/meshes). This post-distillation serves a dual purpose: 1) validates physical consistency of the model. 2) extracting explicit representations avoids continuously running a heavy generator, thus saves compute and facilitates real-time interaction.

Many 3D generators output Gaussian Splats (3DGS) for fast rendering, flexible deployment, and high visual fidelity. Static 3DGS aren't world models (no dynamics/semantics) but a true world model must allow distilling 3D-consistent representations for any given time step (3DGS/meshes). This post-distillation serves a dual purpose: 1) validates physical consistency of the model. 2) extracting explicit representations avoids continuously running a heavy generator, thus saves compute and facilitates real-time interaction.

26,152 Aufrufe

📢Pix2NPHM: Learning to Regress NPHM Reconstructions From a Single Image📢 We directly regress neural parametric head models (NPHMs) from a single image — fast, stable, and significantly more expressive than classical 3DMMs such as FLAME. Face tracking & 3D reconstruction are often limited by the representational capacity of PCA-based face models. By lifting NPHMs to a first-class reconstruction primitive, we enable more accurate geometry, richer expressions, and finer animation control. Pix2NPHM obtains fast and reliable NPHM reconstructions on real-world data. Inference-time optimization against surface normals and canonical point maps can further increase fidelity. Key to successful and generalized training of our ViT-based network are: (1) large-scale registration of existing 3D head datasets, and (2) self-supervised training on vast in-the-wild 2D video datasets using pseudo ground-truth surface normals. Finally, we show that geometry-aware pretraining on pixel-aligned reconstruction tasks significantly outperforms generic visual pretraining (e.g., DINO-style features) in terms of generalization. 🌍 🎥 Great work by Simon Giebenhain, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Zhe Chen

📢Pix2NPHM: Learning to Regress NPHM Reconstructions From a Single Image📢 We directly regress neural parametric head models (NPHMs) from a single image — fast, stable, and significantly more expressive than classical 3DMMs such as FLAME. Face tracking & 3D reconstruction are often limited by the representational capacity of PCA-based face models. By lifting NPHMs to a first-class reconstruction primitive, we enable more accurate geometry, richer expressions, and finer animation control. Pix2NPHM obtains fast and reliable NPHM reconstructions on real-world data. Inference-time optimization against surface normals and canonical point maps can further increase fidelity. Key to successful and generalized training of our ViT-based network are: (1) large-scale registration of existing 3D head datasets, and (2) self-supervised training on vast in-the-wild 2D video datasets using pseudo ground-truth surface normals. Finally, we show that geometry-aware pretraining on pixel-aligned reconstruction tasks significantly outperforms generic visual pretraining (e.g., DINO-style features) in terms of generalization. 🌍 🎥 Great work by Simon Giebenhain, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Zhe Chen

37,707 Aufrufe

(1/2) Check out 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧𝐀𝐯𝐚𝐭𝐚𝐫𝐬: Photorealistic Head Avatars with Rigged 3D Gaussians! We create photorealistic head avatars by animating 3D Gaussians on a parametric face model - edited and rendered in *real-time*!

(1/2) Check out 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧𝐀𝐯𝐚𝐭𝐚𝐫𝐬: Photorealistic Head Avatars with Rigged 3D Gaussians! We create photorealistic head avatars by animating 3D Gaussians on a parametric face model - edited and rendered in *real-time*!

126,940 Aufrufe

The concept of creating an exact digital replica of the physical world has always fascinated me: environments that look and behave exactly like our everyday reality, precisely captured in the digital domain. This is the essence of 𝐖𝐨𝐫𝐥𝐝 𝐌𝐨𝐝𝐞𝐥𝐬, simulated realities indistinguishable from our own. Generating these models is the core mission behind what we are building at SpAItial AI. True World Models must capture both photorealistic appearance and underlying physics, spatially-consistent across the environment. For static scenes, current models already deliver impressive results, unlocking downstream applications from gaming to 3D design. However, the true frontier lies in modeling dynamics, which will enable the training of AI agents whose learned behaviors can bridge the sim-to-real gap, thus unlocking countless real-world applications.

The concept of creating an exact digital replica of the physical world has always fascinated me: environments that look and behave exactly like our everyday reality, precisely captured in the digital domain. This is the essence of 𝐖𝐨𝐫𝐥𝐝 𝐌𝐨𝐝𝐞𝐥𝐬, simulated realities indistinguishable from our own. Generating these models is the core mission behind what we are building at SpAItial AI. True World Models must capture both photorealistic appearance and underlying physics, spatially-consistent across the environment. For static scenes, current models already deliver impressive results, unlocking downstream applications from gaming to 3D design. However, the true frontier lies in modeling dynamics, which will enable the training of AI agents whose learned behaviors can bridge the sim-to-real gap, thus unlocking countless real-world applications.

21,514 Aufrufe

(1/2) Check out "𝐏𝐨𝐥𝐲𝐃𝐢𝐟𝐟: Generating 3D Polygonal Meshes with Diffusion Models"! Our model operates directly on the polygons of 3D meshes and generates novel shapes as output through an iterative diffusion process.

(1/2) Check out "𝐏𝐨𝐥𝐲𝐃𝐢𝐟𝐟: Generating 3D Polygonal Meshes with Diffusion Models"! Our model operates directly on the polygons of 3D meshes and generates novel shapes as output through an iterative diffusion process.

57,912 Aufrufe

📢MeshPad: Interactive Sketch-Conditioned Artist-Designed Mesh Generation and Editing📢 Users can interactively design 3D models just from a sketch-based interface - check out the demo :) We break down the design process into addition with an autoregressive generator and deletion operations enabled by a classifier. To speed-up predictions, we propose a mesh-specific speculator such that users get immediate within a few seconds. Project: Video: Great work by Haoxuan Li Ziya Erkoç Lei Li Daniele Sirigatti V. Rosov Angela Dai

📢MeshPad: Interactive Sketch-Conditioned Artist-Designed Mesh Generation and Editing📢 Users can interactively design 3D models just from a sketch-based interface - check out the demo :) We break down the design process into addition with an autoregressive generator and deletion operations enabled by a classifier. To speed-up predictions, we propose a mesh-specific speculator such that users get immediate within a few seconds. Project: Video: Great work by Haoxuan Li Ziya Erkoç Lei Li Daniele Sirigatti V. Rosov Angela Dai

29,987 Aufrufe

📢 LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans🏠✨ -> converts RGB-D scans into compact, realistic, and interactive 3D scenes — featuring high-quality meshes, PBR materials, and articulated objects. 📷 🌍

📢 LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans🏠✨ -> converts RGB-D scans into compact, realistic, and interactive 3D scenes — featuring high-quality meshes, PBR materials, and articulated objects. 📷 🌍

23,722 Aufrufe

(1/3) 📢📢𝐆𝐆𝐇𝐞𝐚𝐝 𝐅𝐚𝐬𝐭 𝐚𝐧𝐝 𝐆𝐞𝐧𝐞𝐫𝐚𝐥𝐢𝐳𝐚𝐛𝐥𝐞 𝟑𝐃 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐇𝐞𝐚𝐝𝐬📢📢 #SiggraphAsia'24 We generate photo-realistic 3D heads and render them with Gaussian Splatting at 1k resolution in real-time.

(1/3) 📢📢𝐆𝐆𝐇𝐞𝐚𝐝 𝐅𝐚𝐬𝐭 𝐚𝐧𝐝 𝐆𝐞𝐧𝐞𝐫𝐚𝐥𝐢𝐳𝐚𝐛𝐥𝐞 𝟑𝐃 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐇𝐞𝐚𝐝𝐬📢📢 #SiggraphAsia'24 We generate photo-realistic 3D heads and render them with Gaussian Splatting at 1k resolution in real-time.

36,604 Aufrufe

📢Announcing our 3D head avatar benchmark📢 Two tasks with hidden test sets: - Dynamic Novel View Synthesis on Heads - Monocular FLAME-driven Head Avatar Reconstruction Our goal is to make research on 3D head avatars more comparable and ultimately increase the realism of digital humans. The benchmark studies distinct phenomena of 3D head avatar creation, such as extreme facial expressions, slow motion captures of shaking long hair, or complicated light reflection and refraction patterns of glasses. The two benchmark tasks assess two core desiderata of 3D avatars: While the novel view synthesis challenge focuses on best possible rendering quality of complex moving scenes, the avatar animation challenge is concerned with how well a driving signal is translated into an avatar. Evaluations are light-weight and consist of diverse video recordings from the popular NeRSemble dataset with a hidden test set. Participation in the benchmark is therefore straight-forward and requires only 5 reconstructions per task. Leaderboard and benchmark submission: Benchmark data access and toolkit: Great work by Tobias Kirschstein Simon Giebenhain

📢Announcing our 3D head avatar benchmark📢 Two tasks with hidden test sets: - Dynamic Novel View Synthesis on Heads - Monocular FLAME-driven Head Avatar Reconstruction Our goal is to make research on 3D head avatars more comparable and ultimately increase the realism of digital humans. The benchmark studies distinct phenomena of 3D head avatar creation, such as extreme facial expressions, slow motion captures of shaking long hair, or complicated light reflection and refraction patterns of glasses. The two benchmark tasks assess two core desiderata of 3D avatars: While the novel view synthesis challenge focuses on best possible rendering quality of complex moving scenes, the avatar animation challenge is concerned with how well a driving signal is translated into an avatar. Evaluations are light-weight and consist of diverse video recordings from the popular NeRSemble dataset with a hidden test set. Participation in the benchmark is therefore straight-forward and requires only 5 reconstructions per task. Leaderboard and benchmark submission: Benchmark data access and toolkit: Great work by Tobias Kirschstein Simon Giebenhain

28,055 Aufrufe

Check out MultiDiff #CVPR2024! From a single RGB image, MultiDiff enables scene-level novel view synthesis with free camera control. Great work by @normanisation Katja Schwarz Barbara Roessle, L Porzi, S Rota Bulò, P Kontschieder

Check out MultiDiff #CVPR2024! From a single RGB image, MultiDiff enables scene-level novel view synthesis with free camera control. Great work by @normanisation Katja Schwarz Barbara Roessle, L Porzi, S Rota Bulò, P Kontschieder

38,986 Aufrufe

(1/2) 📢𝐍𝐏𝐆𝐀: 𝐍𝐞𝐮𝐫𝐚𝐥 𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐫𝐢𝐜 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐀𝐯𝐚𝐭𝐚𝐫𝐬 📢 #SIGGRAPHAsia We leverage a neural parametric representation to facilitate precise control over 3D Gaussians to obtain high-fidelity avatars.

(1/2) 📢𝐍𝐏𝐆𝐀: 𝐍𝐞𝐮𝐫𝐚𝐥 𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐫𝐢𝐜 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐀𝐯𝐚𝐭𝐚𝐫𝐬 📢 #SIGGRAPHAsia We leverage a neural parametric representation to facilitate precise control over 3D Gaussians to obtain high-fidelity avatars.

30,732 Aufrufe

(1/3) Can we turn text-to-image models into photorealistic 3D generators? ViewDiff (#CVPR2024) produces realistic, multi-view consistent images of real-world 3D objects in authentic surroundings. Website Video How does it work?

(1/3) Can we turn text-to-image models into photorealistic 3D generators? ViewDiff (#CVPR2024) produces realistic, multi-view consistent images of real-world 3D objects in authentic surroundings. Website Video How does it work?

34,738 Aufrufe

(1/2) We released our Neural Parametric Head Models (NPHM) dataset from our #CVPR2023 paper! It includes over 5600 high-fidelity 3D scans of human heads from 272 subjects - all publicly available! Check it out!

(1/2) We released our Neural Parametric Head Models (NPHM) dataset from our #CVPR2023 paper! It includes over 5600 high-fidelity 3D scans of human heads from 272 subjects - all publicly available! Check it out!

36,014 Aufrufe

📢GeomHair: Reconstruction of Hair Strands from Colorless 3D Scans📢 We present a novel method to reconstruct hair strands from colorless 3D scans by extracting orientation cues directly from the mesh surface geometry by finding local characteristic lines and from shaded renderings using a neural 2D line detector. We enhance the reconstruction with a diffusion prior trained on synthetic hair data and adapted to each scan using a tailored text prompt, allowing us to recover both simple and complex hairstyles without relying on color input. To support further research, we also introduce Strands400, the largest publicly available dataset of 3D hair strand reconstructions from real-world scans of 400 different people, featuring complicated hairstyles, such as ponytails and buns. 🌍 📷 Great work by Rachmadio Noval L. Artem Sevastopolsky Egor Zakharov @ness_pris

📢GeomHair: Reconstruction of Hair Strands from Colorless 3D Scans📢 We present a novel method to reconstruct hair strands from colorless 3D scans by extracting orientation cues directly from the mesh surface geometry by finding local characteristic lines and from shaded renderings using a neural 2D line detector. We enhance the reconstruction with a diffusion prior trained on synthetic hair data and adapted to each scan using a tailored text prompt, allowing us to recover both simple and complex hairstyles without relying on color input. To support further research, we also introduce Strands400, the largest publicly available dataset of 3D hair strand reconstructions from real-world scans of 400 different people, featuring complicated hairstyles, such as ponytails and buns. 🌍 📷 Great work by Rachmadio Noval L. Artem Sevastopolsky Egor Zakharov @ness_pris

12,448 Aufrufe

(1/2) MonoNPHM will be presented as a #CVPR2024 Highlight! Our Neural Parametric Head Model parametrizes both geometry and appearance. With the learned model, we can then 3D reconstruct and track human heads from images or videos.

(1/2) MonoNPHM will be presented as a #CVPR2024 Highlight! Our Neural Parametric Head Model parametrizes both geometry and appearance. With the learned model, we can then 3D reconstruct and track human heads from images or videos.

17,170 Aufrufe

📢📢𝐍𝐞𝐑𝐒𝐞𝐦𝐛𝐥𝐞 𝐯𝟐 𝐃𝐚𝐭𝐚𝐬𝐞𝐭 𝐑𝐞𝐥𝐞𝐚𝐬𝐞📢📢 Head captures of 7.1MP from 16 cameras at 73fps: * More recordings (425 people) * Better color calibration * Convenient download scripts The new version of our dataset adds 156 participants for a total of 425 different people. In its entirety, the dataset provides now 65 million images from over 15 hours of diverse human facial expression performances. We improved the color consistency of the recorded images with a better color calibration procedure. As a result, 3D reconstructions with images from the NeRSemble dataset should become better and look more realistic. Finally, we made it much easier to download the recordings with our new download repository. It now just takes a single command to download all frontal hair shake videos of all participants or to download all recordings of a single participant. Check it out: Awesome work by Tobias Kirschstein Simon Giebenhain !!!

📢📢𝐍𝐞𝐑𝐒𝐞𝐦𝐛𝐥𝐞 𝐯𝟐 𝐃𝐚𝐭𝐚𝐬𝐞𝐭 𝐑𝐞𝐥𝐞𝐚𝐬𝐞📢📢 Head captures of 7.1MP from 16 cameras at 73fps: * More recordings (425 people) * Better color calibration * Convenient download scripts The new version of our dataset adds 156 participants for a total of 425 different people. In its entirety, the dataset provides now 65 million images from over 15 hours of diverse human facial expression performances. We improved the color consistency of the recorded images with a better color calibration procedure. As a result, 3D reconstructions with images from the NeRSemble dataset should become better and look more realistic. Finally, we made it much easier to download the recordings with our new download repository. It now just takes a single command to download all frontal hair shake videos of all participants or to download all recordings of a single participant. Check it out: Awesome work by Tobias Kirschstein Simon Giebenhain !!!

12,088 Aufrufe

Videos

MattNiessner's profile picture

Want to create an avatar from a single image? FlexAvatar is a transformer model that creates full 360°, high-quality, and expressive 3D head avatar from just a single portrait image in minutes. Real-time Demo: FlexAvatar's lightweight architecture allows both animation and rendering in real-time, enabling interactive user experiences. To create a new 3D head avatar, only one image is required, e.g., from a webcam. The final avatar is ready after 2 minutes. Architecture: Under the hood, FlexAvatar adopts a transformer-based encoder-decoder design. The encoder maps the input image onto a latent avatar space, while the decoder produces 3D Gaussian attribute maps by incorporating the animation signal via cross-attention. The model learns all facial animations directly from the data without relying on pre-built 3D face models. This equips the avatars with realistic facial expressions. The internal avatar latent space can be conveniently used to integrate additional observations of a person via fitting. This enables use-cases where more than one image of a person is available, e.g., from a phone scan of the person. We train jointly on 2D monocular videos and multi-view data. However, in monocular videos, the animation signal leaks the target viewpoint, causing the model to produce incomplete 3D heads. We call this phenomenon entanglement of driving signal and target viewpoint. To prevent entanglement, we introduce bias sinks. These are learnable tokens that indicate whether a training sample stems from a monocular or a multi-view dataset. During training, the model learns to produce incomplete 3D heads only when the monocular token is present. During inference, FlexAvatar then always uses the multi-view token for which the model has learned to produce complete 3D heads. This simple design allows to combine the generalizability from monocular data with the quality of multi-view data. FlexAvatar summary: - Input: Single-image, phone scan, or monocular video - Output: Full 360° head avatar - Expressive animations - Real-time rendering and animation - Generalization to any portrait - Create a new avatar in 2 minutes - Use bias sinks to combine 2D and 3D data 🏠 🌍 🎥 Great work by Tobias Kirschstein and Simon Giebenhain!

Matthias Niessner

76,438 Aufrufe • vor 5 Monaten

MattNiessner's profile picture

🚀Announcing NeRSemble 3D Head Avatar Benchmark v2 Version 2 of the NeRSemble 3D Head Avatar Benchmark systematically evaluates several aspects of 3D head avatar creation. Our goal is to drive progress toward more realistic, robust, and generalizable avatar methods. 🔬Benchmark Tasks The NeRSemble Benchmark v2 features three core challenges: - Dynamic Novel View Synthesis - Monocular FLAME-driven Avatar Creation (updated) - Single-view 3D Face Reconstruction (new) 👉Explore the online leaderboard and submission system: 🆕What's new? 1. New Task: Single-view 3D Face Reconstruction Given a single portrait image, reconstruct an accurate 3D mesh either showing the input expression or a fully neutral one. Unlike prior benchmarks, the NeRSemble benchmark emphasizes diverse and challenging facial expressions, better reflecting real scenarios. For technical details, see the Pixel3DMM paper. 2. Updated task: Monocular FLAME-driven Avatar Creation We have improved the FLAME tracking that is used for both avatar creation from the monocular videos and avatar driving on the hidden test sequences. The updated benchmark task has: - more stable torso tracking - more expressive lip closures during speech - Improved mouth tracking for challenging facial expressions We hope that these improvements to the benchmark help drive the field forward. 🏆 CVPR 2026 Workshop & Prizes The NeRSemble benchmark will be featured at the CVPR 2026 Workshop on Photo-realistic 3D Head Avatars. Participants in the new and updated tasks have the opportunity to win: - 🎁RTX 5080 GPUs (sponsored by NVIDIA) - 🎤15-minute oral presentation at the workshop ⏰ Submission Deadline - May 26, 2026 Reach out to the amazing Tobias Kirschstein and Simon Giebenhain for more details :)

Matthias Niessner

29,478 Aufrufe • vor 1 Monat

MattNiessner's profile picture

📢📢 𝐀𝐯𝐚𝐭𝟑𝐫 📢📢 Avat3r creates high-quality 3D head avatars from just a few input images in a single forward pass with a new dynamic 3DGS reconstruction model. Video: Project: Our core idea is to make Gaussian Reconstruction Models animatable. We find that a simple cross-attention to an expression code sequence is already sufficient to model complex facial expressions. We then incorporate position maps from DUSt3R and feature maps from Sapiens to facilitate the prediction task. While DUSt3R's position maps act as a pixel-aligned initialization for the Gaussians' positions, the Sapiens feature maps help the cross-view transformer to match corresponding image tokens in the 4 input images. One major challenge in creating a 3D head avatar from smartphone images comes from inconsistent facial expressions when the subject could not remain perfectly static during the capture. We eliminate this static requirement by simply showing our model input images with different facial expressions during training. This technique makes our model robust to inconsistent input images later on. Finally, we show that despite the model has been trained with 4 input images, one can even create a 3D head avatar when only a single image is available. To achieve this, we employ a pre-trained 3D GAN to lift the single image to 3D and then render the 4 input images for our model. This allows us to create 3D head avatars from single images and even highly out-of-distribution examples like AI generated faces, paintings or statues. Great work by Tobias Kirschstein from his internship at Meta with Javier Romero, Artem Sevastopolsky, and Shunsuke Saito

Matthias Niessner

74,681 Aufrufe • vor 1 Jahr