Matthias Niessner

@MattNiessner • 48,393 subscribers

Professor for Visual Computing & Artificial Intelligence @TU_Muenchen Co-Founder @synthesiaIO Co-Founder @SpAItial_AI

Shorts

(1/2) Check out 𝐌𝐞𝐬𝐡𝐆𝐏𝐓! MeshGPT generates triangle meshes by autoregressively sampling from a transformer model that produces tokens from a learned geometric vocabulary. As a result, we obtain clean and compact meshes :)

396,257 görüntüleme

Many 3D generators output Gaussian Splats (3DGS) for fast rendering, flexible deployment, and high visual fidelity. Static 3DGS aren't world models (no dynamics/semantics) but a true world model must allow distilling 3D-consistent representations for any given time step (3DGS/meshes). This post-distillation serves a dual purpose: 1) validates physical consistency of the model. 2) extracting explicit representations avoids continuously running a heavy generator, thus saves compute and facilitates real-time interaction.

26,226 görüntüleme

(1/2) Check out 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧𝐀𝐯𝐚𝐭𝐚𝐫𝐬: Photorealistic Head Avatars with Rigged 3D Gaussians! We create photorealistic head avatars by animating 3D Gaussians on a parametric face model - edited and rendered in real-time!

126,983 görüntüleme

📢Pix2NPHM: Learning to Regress NPHM Reconstructions From a Single Image📢 We directly regress neural parametric head models (NPHMs) from a single image — fast, stable, and significantly more expressive than classical 3DMMs such as FLAME. Face tracking & 3D reconstruction are often limited by the representational capacity of PCA-based face models. By lifting NPHMs to a first-class reconstruction primitive, we enable more accurate geometry, richer expressions, and finer animation control. Pix2NPHM obtains fast and reliable NPHM reconstructions on real-world data. Inference-time optimization against surface normals and canonical point maps can further increase fidelity. Key to successful and generalized training of our ViT-based network are: (1) large-scale registration of existing 3D head datasets, and (2) self-supervised training on vast in-the-wild 2D video datasets using pseudo ground-truth surface normals. Finally, we show that geometry-aware pretraining on pixel-aligned reconstruction tasks significantly outperforms generic visual pretraining (e.g., DINO-style features) in terms of generalization. 🌍 🎥 Great work by Simon Giebenhain, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Zhe Chen

37,850 görüntüleme

The concept of creating an exact digital replica of the physical world has always fascinated me: environments that look and behave exactly like our everyday reality, precisely captured in the digital domain. This is the essence of 𝐖𝐨𝐫𝐥𝐝 𝐌𝐨𝐝𝐞𝐥𝐬, simulated realities indistinguishable from our own. Generating these models is the core mission behind what we are building at SpAItial AI. True World Models must capture both photorealistic appearance and underlying physics, spatially-consistent across the environment. For static scenes, current models already deliver impressive results, unlocking downstream applications from gaming to 3D design. However, the true frontier lies in modeling dynamics, which will enable the training of AI agents whose learned behaviors can bridge the sim-to-real gap, thus unlocking countless real-world applications.

21,586 görüntüleme

(1/2) Intrinsic Image Diffusion for Single-view Material Estimation! We propose a probabilistic diffusion model to handle material & lighting ambiguities. We obtain sharp material estimates and facilitate high-fidelity relighting.

69,400 görüntüleme

(1/2) Check out "𝐏𝐨𝐥𝐲𝐃𝐢𝐟𝐟: Generating 3D Polygonal Meshes with Diffusion Models"! Our model operates directly on the polygons of 3D meshes and generates novel shapes as output through an iterative diffusion process.

57,941 görüntüleme

📢MeshPad: Interactive Sketch-Conditioned Artist-Designed Mesh Generation and Editing📢 Users can interactively design 3D models just from a sketch-based interface - check out the demo :) We break down the design process into addition with an autoregressive generator and deletion operations enabled by a classifier. To speed-up predictions, we propose a mesh-specific speculator such that users get immediate within a few seconds. Project: Video: Great work by Haoxuan Li Ziya Erkoç Lei Li Daniele Sirigatti V. Rosov Angela Dai

30,020 görüntüleme

(1/3) 📢📢𝐆𝐆𝐇𝐞𝐚𝐝 𝐅𝐚𝐬𝐭 𝐚𝐧𝐝 𝐆𝐞𝐧𝐞𝐫𝐚𝐥𝐢𝐳𝐚𝐛𝐥𝐞 𝟑𝐃 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐇𝐞𝐚𝐝𝐬📢📢 #SiggraphAsia'24 We generate photo-realistic 3D heads and render them with Gaussian Splatting at 1k resolution in real-time.

36,618 görüntüleme

Check out MultiDiff #CVPR2024! From a single RGB image, MultiDiff enables scene-level novel view synthesis with free camera control. Great work by @normanisation Katja Schwarz Barbara Roessle, L Porzi, S Rota Bulò, P Kontschieder

39,059 görüntüleme

📢 LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans🏠✨ -> converts RGB-D scans into compact, realistic, and interactive 3D scenes — featuring high-quality meshes, PBR materials, and articulated objects. 📷 🌍

23,790 görüntüleme

$📢Announcing our 3D head avatar benchmark📢 Two tasks with hidden test sets: - Dynamic Novel View Synthesis on Heads - Monocular FLAME-driven Head Avatar Reconstruction Our goal is to make research on 3D head avatars more comparable and ultimately increase the realism of digital humans. The benchmark studies distinct phenomena of 3D head avatar creation, such as extreme facial expressions, slow motion captures of shaking long hair, or complicated light reflection and refraction patterns of glasses. The two benchmark tasks assess two core desiderata of 3D avatars: While the novel view synthesis challenge focuses on best possible rendering quality of complex moving scenes, the avatar animation challenge is concerned with how well a driving signal is translated into an avatar. Evaluations are light-weight and consist of diverse video recordings from the popular NeRSemble dataset with a hidden test set. Participation in the benchmark is therefore straight-forward and requires only 5 reconstructions per task. Leaderboard and benchmark submission: Benchmark data access and toolkit: Great work by Tobias Kirschstein Simon Giebenhain$

📢Announcing our 3D head avatar benchmark📢 Two tasks with hidden test sets: - Dynamic Novel View Synthesis on Heads - Monocular FLAME-driven Head Avatar Reconstruction Our goal is to make research on 3D head avatars more comparable and ultimately increase the realism of digital humans. The benchmark studies distinct phenomena of 3D head avatar creation, such as extreme facial expressions, slow motion captures of shaking long hair, or complicated light reflection and refraction patterns of glasses. The two benchmark tasks assess two core desiderata of 3D avatars: While the novel view synthesis challenge focuses on best possible rendering quality of complex moving scenes, the avatar animation challenge is concerned with how well a driving signal is translated into an avatar. Evaluations are light-weight and consist of diverse video recordings from the popular NeRSemble dataset with a hidden test set. Participation in the benchmark is therefore straight-forward and requires only 5 reconstructions per task. Leaderboard and benchmark submission: Benchmark data access and toolkit: Great work by Tobias Kirschstein Simon Giebenhain

28,075 görüntüleme

(1/2) Happy to announce that Text2Tex has been accepted at #ICCV2023 🎉 Taking a mesh and a text prompt as input, Text2Tex generates high quality textures - it's fully automated and easy to scale to many models! Project: Video:

44,117 görüntüleme

(1/2) 📢𝐍𝐏𝐆𝐀: 𝐍𝐞𝐮𝐫𝐚𝐥 𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐫𝐢𝐜 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐀𝐯𝐚𝐭𝐚𝐫𝐬 📢 #SIGGRAPHAsia We leverage a neural parametric representation to facilitate precise control over 3D Gaussians to obtain high-fidelity avatars.

30,741 görüntüleme

Can we match visual features jointly across multiple frames? Yes! Barbara Roessle's #ICCV2023 paper proposes a differentiable pose optimization for end2end feature matching across multiple frames, thus obtaining better poses!

41,507 görüntüleme

(1/3) Can we turn text-to-image models into photorealistic 3D generators? ViewDiff (#CVPR2024) produces realistic, multi-view consistent images of real-world 3D objects in authentic surroundings. Website Video How does it work?

34,753 görüntüleme

(1/2) We released our Neural Parametric Head Models (NPHM) dataset from our #CVPR2023 paper! It includes over 5600 high-fidelity 3D scans of human heads from 272 subjects - all publicly available! Check it out!

36,064 görüntüleme

Check out TriPlaneNet! From a single image, we predict EG3D latents & offsets, thus obtaining high-fidelity 3D models. Works in real time! Great work by our stellar MA student Ananta R. Bhattarai advised by Artem Sevastopolsky #WACV2024

28,867 görüntüleme

📢GeomHair: Reconstruction of Hair Strands from Colorless 3D Scans📢 We present a novel method to reconstruct hair strands from colorless 3D scans by extracting orientation cues directly from the mesh surface geometry by finding local characteristic lines and from shaded renderings using a neural 2D line detector. We enhance the reconstruction with a diffusion prior trained on synthetic hair data and adapted to each scan using a tailored text prompt, allowing us to recover both simple and complex hairstyles without relying on color input. To support further research, we also introduce Strands400, the largest publicly available dataset of 3D hair strand reconstructions from real-world scans of 400 different people, featuring complicated hairstyles, such as ponytails and buns. 🌍 📷 Great work by Rachmadio Noval L. Artem Sevastopolsky Egor Zakharov @ness_pris

12,466 görüntüleme

(1/2) MonoNPHM will be presented as a #CVPR2024 Highlight! Our Neural Parametric Head Model parametrizes both geometry and appearance. With the learned model, we can then 3D reconstruct and track human heads from images or videos.

17,209 görüntüleme

Videos

LIVE

1.2k

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Streaming Now

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

HD live stream

Exclusive private shows

1.2k viewers online

Current Status

Live

Private Show

Join now for exclusive access

Free preview available • Premium content

Live Cam

Matthias Niessner

Shorts

(1/2) Check out 𝐌𝐞𝐬𝐡𝐆𝐏𝐓! MeshGPT generates triangle meshes by autoregressively sampling from a transformer model that produces tokens from a learned geometric vocabulary. As a result, we obtain clean and compact meshes :)

(1/2) Check out 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧𝐀𝐯𝐚𝐭𝐚𝐫𝐬: Photorealistic Head Avatars with Rigged 3D Gaussians! We create photorealistic head avatars by animating 3D Gaussians on a parametric face model - edited and rendered in *real-time*!

(1/2) Intrinsic Image Diffusion for Single-view Material Estimation! We propose a probabilistic diffusion model to handle material &amp; lighting ambiguities. We obtain sharp material estimates and facilitate high-fidelity relighting.

(1/2) Check out "𝐏𝐨𝐥𝐲𝐃𝐢𝐟𝐟: Generating 3D Polygonal Meshes with Diffusion Models"! Our model operates directly on the polygons of 3D meshes and generates novel shapes as output through an iterative diffusion process.

Check out MultiDiff #CVPR2024! From a single RGB image, MultiDiff enables scene-level novel view synthesis with free camera control. Great work by @normanisation Katja Schwarz Barbara Roessle, L Porzi, S Rota Bulò, P Kontschieder

📢 LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans🏠✨ -&gt; converts RGB-D scans into compact, realistic, and interactive 3D scenes — featuring high-quality meshes, PBR materials, and articulated objects. 📷 🌍

(1/2) Happy to announce that Text2Tex has been accepted at #ICCV2023 🎉 Taking a mesh and a text prompt as input, Text2Tex generates high quality textures - it's fully automated and easy to scale to many models! Project: Video:

Can we match visual features jointly across multiple frames? Yes! Barbara Roessle's #ICCV2023 paper proposes a differentiable pose optimization for end2end feature matching across multiple frames, thus obtaining better poses!

(1/3) Can we turn text-to-image models into photorealistic 3D generators? ViewDiff (#CVPR2024) produces realistic, multi-view consistent images of real-world 3D objects in authentic surroundings. Website Video How does it work?

(1/2) We released our Neural Parametric Head Models (NPHM) dataset from our #CVPR2023 paper! It includes over 5600 high-fidelity 3D scans of human heads from 272 subjects - all publicly available! Check it out!

Check out TriPlaneNet! From a single image, we predict EG3D latents &amp; offsets, thus obtaining high-fidelity 3D models. Works in real time! Great work by our stellar MA student Ananta R. Bhattarai advised by Artem Sevastopolsky #WACV2024

(1/2) MonoNPHM will be presented as a #CVPR2024 Highlight! Our Neural Parametric Head Model parametrizes both geometry and appearance. With the learned model, we can then 3D reconstruct and track human heads from images or videos.

Videos

Watch Anya Live

📢TriFlow: Generating Artist-Like 3D Mesh Topology via Nearest-Vertex Vector Fields (ECCV'26)📢 Compact 3D meshes with clean, artist-like triangle topology - structured connectivity you'd expect a human modeler to make🙂 🌐 ▶️

📢GaussianGPT (ECCV'26) Code Release📢 What if 3D scenes worked like language? Generate full 3D Gaussian scenes - from scratch or from partial - token by token! 🔗 🌐 Great work by Nicolas von Lützow, Barbara Roessle, Katharina Schmid

📢 FaceAnything (ECCV 2026) Code Release 📢 Turn any image sequence into high-fidelity 4D face reconstructions, without controlled capture rigs. Try it on Hugging Face &amp; reconstruct your face in 4D! 🔗 🤗 🌐

📢 GenRecon Code Release 📢 Few images in → complete, high-fidelity 3D scene out! GenRecon builds a generative prior on full scenes, resulting in unprecedented 3D reconstruction quality. 🔗 🌐 📄

📢 3D world models from video diffusion suffer from inconsistent frames -&gt; blurry output. Our fix: instead of naïve 3D reconstruction, we non-rigidly align each frame into a globally-consistent 3DGS representation. -&gt;sharp visuals on top of any VDM!

Excited to share Norman Müller's DiffRF: Rendering-guided 3D Radiance Field Diffusion #CVPR2023 highlight! 2D diffusion is great, but what about 3D? We show radiance field diffusion with rendering guidance for consistent and editable 3D synthesis. Vid:

Can we use video diffusion to generate 3D scenes? 𝐖𝐨𝐫𝐥𝐝𝐄𝐱𝐩𝐥𝐨𝐫𝐞𝐫 (#SIGGRAPHAsia25) creates fully-navigable scenes via autoregressive video generation. Text input -&gt; 3DGS scene output &amp; interactive rendering! 🌍 📽️

(1/4) Excited to share our #ICCV2023 paper Text2Room! We generate scene-scale textured 3D meshes from a given text prompt leveraging 2D text-to-image models such as StableDiffusion. Project: Code: Video:

(1/2) Check out 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧𝐀𝐯𝐚𝐭𝐚𝐫𝐬: Photorealistic Head Avatars with Rigged 3D Gaussians! We create photorealistic head avatars by animating 3D Gaussians on a parametric face model - edited and rendered in real-time!

(1/2) Intrinsic Image Diffusion for Single-view Material Estimation! We propose a probabilistic diffusion model to handle material & lighting ambiguities. We obtain sharp material estimates and facilitate high-fidelity relighting.

📢 LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans🏠✨ -> converts RGB-D scans into compact, realistic, and interactive 3D scenes — featuring high-quality meshes, PBR materials, and articulated objects. 📷 🌍

Check out TriPlaneNet! From a single image, we predict EG3D latents & offsets, thus obtaining high-fidelity 3D models. Works in real time! Great work by our stellar MA student Ananta R. Bhattarai advised by Artem Sevastopolsky #WACV2024

📢 FaceAnything (ECCV 2026) Code Release 📢 Turn any image sequence into high-fidelity 4D face reconstructions, without controlled capture rigs. Try it on Hugging Face & reconstruct your face in 4D! 🔗 🤗 🌐

📢 3D world models from video diffusion suffer from inconsistent frames -> blurry output. Our fix: instead of naïve 3D reconstruction, we non-rigidly align each frame into a globally-consistent 3DGS representation. ->sharp visuals on top of any VDM!

Can we use video diffusion to generate 3D scenes? 𝐖𝐨𝐫𝐥𝐝𝐄𝐱𝐩𝐥𝐨𝐫𝐞𝐫 (#SIGGRAPHAsia25) creates fully-navigable scenes via autoregressive video generation. Text input -> 3DGS scene output & interactive rendering! 🌍 📽️