Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

📢Pix2NPHM: Learning to Regress NPHM Reconstructions From a Single Image📢 We directly regress neural parametric head models (NPHMs) from a single image — fast, stable, and significantly more expressive than classical 3DMMs such as FLAME. Face tracking & 3D reconstruction are often limited by the representational capacity of PCA-based... face models. By lifting NPHMs to a first-class reconstruction primitive, we enable more accurate geometry, richer expressions, and finer animation control. Pix2NPHM obtains fast and reliable NPHM reconstructions on real-world data. Inference-time optimization against surface normals and canonical point maps can further increase fidelity. Key to successful and generalized training of our ViT-based network are: (1) large-scale registration of existing 3D head datasets, and (2) self-supervised training on vast in-the-wild 2D video datasets using pseudo ground-truth surface normals. Finally, we show that geometry-aware pretraining on pixel-aligned reconstruction tasks significantly outperforms generic visual pretraining (e.g., DINO-style features) in terms of generalization. 🌍 🎥 Great work by Simon Giebenhain, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Zhe Chenshow more

Matthias Niessner

47,952 subscribers

37,807 Aufrufe • vor 6 Monaten •via X (Twitter)

Kunst Bildung Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

📢GeomHair: Reconstruction of Hair Strands from Colorless 3D Scans📢 We present a novel method to reconstruct hair strands from colorless 3D scans by extracting orientation cues directly from the mesh surface geometry by finding local characteristic lines and from shaded renderings using a neural 2D line detector. We enhance the reconstruction with a diffusion prior trained on synthetic hair data and adapted to each scan using a tailored text prompt, allowing us to recover both simple and complex hairstyles without relying on color input. To support further research, we also introduce Strands400, the largest publicly available dataset of 3D hair strand reconstructions from real-world scans of 400 different people, featuring complicated hairstyles, such as ponytails and buns. 🌍 📷 Great work by Rachmadio Noval L. Artem Sevastopolsky Egor Zakharov @ness_pris

📢GeomHair: Reconstruction of Hair Strands from Colorless 3D Scans📢 We present a novel method to reconstruct hair strands from colorless 3D scans by extracting orientation cues directly from the mesh surface geometry by finding local characteristic lines and from shaded renderings using a neural 2D line detector. We enhance the reconstruction with a diffusion prior trained on synthetic hair data and adapted to each scan using a tailored text prompt, allowing us to recover both simple and complex hairstyles without relying on color input. To support further research, we also introduce Strands400, the largest publicly available dataset of 3D hair strand reconstructions from real-world scans of 400 different people, featuring complicated hairstyles, such as ponytails and buns. 🌍 📷 Great work by Rachmadio Noval L. Artem Sevastopolsky Egor Zakharov @ness_pris

Matthias Niessner

12,466 Aufrufe • vor 1 Jahr

F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent Gaussian Splatting Contributions: • We pioneer 3D-aware generation using generalizable feed-forward Gaussian Splatting representation, achieving significant efficiency and favorable rendering quality on monocular datasets. • We significantly advance the capability of pixel-aligned Gaussian Splatting representations by designing a self-supervised cycle training strategy specifically tailored for monocular datasets. • We further mitigate the artifacts of 3D-aware representations caused by large viewpoint shifts by introducing geometry-aware video priors.

F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent Gaussian Splatting Contributions: • We pioneer 3D-aware generation using generalizable feed-forward Gaussian Splatting representation, achieving significant efficiency and favorable rendering quality on monocular datasets. • We significantly advance the capability of pixel-aligned Gaussian Splatting representations by designing a self-supervised cycle training strategy specifically tailored for monocular datasets. • We further mitigate the artifacts of 3D-aware representations caused by large viewpoint shifts by introducing geometry-aware video priors.

MrNeRF

14,229 Aufrufe • vor 1 Jahr

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

MrNeRF

52,801 Aufrufe • vor 1 Jahr

Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction Contributions: • A method for efficiently reconstructing dynamic surfaces from multi-view videos using Gaussian surfels. • A unified and gradient-aware densification strategy for optimizing dynamic 3D Gaussians with fine details. • A temporal consistency approach that ensures stable and coherent surface reconstructions across frames by enforcing consistency on curvature maps. • Extensive experiments that demonstrate our method’s advantages including fast training, high-fidelity novel view synthesis, and accurate surface geometry.

Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction Contributions: • A method for efficiently reconstructing dynamic surfaces from multi-view videos using Gaussian surfels. • A unified and gradient-aware densification strategy for optimizing dynamic 3D Gaussians with fine details. • A temporal consistency approach that ensures stable and coherent surface reconstructions across frames by enforcing consistency on curvature maps. • Extensive experiments that demonstrate our method’s advantages including fast training, high-fidelity novel view synthesis, and accurate surface geometry.

MrNeRF

31,821 Aufrufe • vor 1 Jahr

Human Hair Reconstruction with Strand-Aligned 3D Gaussians Contributions (cited): – We propose a new 3D line lifting scheme that uses a modified 3DGS reconstruction technique to lift 2D orientation maps into a 3D field while also providing refinement of the camera parameters; – We introduce a dual representation of hair strand polylines and 3D Gaussians to achieve differentiable rasterization of hair strands and leverage photometric constraints for strand-based hair reconstruction; – Based on these components, we propose a coarse-to-fine optimization method for prior-guided hair reconstruction that leverages both latent and explicit representations of the hairstyle.

Human Hair Reconstruction with Strand-Aligned 3D Gaussians Contributions (cited): – We propose a new 3D line lifting scheme that uses a modified 3DGS reconstruction technique to lift 2D orientation maps into a 3D field while also providing refinement of the camera parameters; – We introduce a dual representation of hair strand polylines and 3D Gaussians to achieve differentiable rasterization of hair strands and leverage photometric constraints for strand-based hair reconstruction; – Based on these components, we propose a coarse-to-fine optimization method for prior-guided hair reconstruction that leverages both latent and explicit representations of the hairstyle.

MrNeRF

106,521 Aufrufe • vor 1 Jahr

$📢Announcing our 3D head avatar benchmark📢 Two tasks with hidden test sets: - Dynamic Novel View Synthesis on Heads - Monocular FLAME-driven Head Avatar Reconstruction Our goal is to make research on 3D head avatars more comparable and ultimately increase the realism of digital humans. The benchmark studies distinct phenomena of 3D head avatar creation, such as extreme facial expressions, slow motion captures of shaking long hair, or complicated light reflection and refraction patterns of glasses. The two benchmark tasks assess two core desiderata of 3D avatars: While the novel view synthesis challenge focuses on best possible rendering quality of complex moving scenes, the avatar animation challenge is concerned with how well a driving signal is translated into an avatar. Evaluations are light-weight and consist of diverse video recordings from the popular NeRSemble dataset with a hidden test set. Participation in the benchmark is therefore straight-forward and requires only 5 reconstructions per task. Leaderboard and benchmark submission: Benchmark data access and toolkit: Great work by Tobias Kirschstein Simon Giebenhain$

📢Announcing our 3D head avatar benchmark📢 Two tasks with hidden test sets: - Dynamic Novel View Synthesis on Heads - Monocular FLAME-driven Head Avatar Reconstruction Our goal is to make research on 3D head avatars more comparable and ultimately increase the realism of digital humans. The benchmark studies distinct phenomena of 3D head avatar creation, such as extreme facial expressions, slow motion captures of shaking long hair, or complicated light reflection and refraction patterns of glasses. The two benchmark tasks assess two core desiderata of 3D avatars: While the novel view synthesis challenge focuses on best possible rendering quality of complex moving scenes, the avatar animation challenge is concerned with how well a driving signal is translated into an avatar. Evaluations are light-weight and consist of diverse video recordings from the popular NeRSemble dataset with a hidden test set. Participation in the benchmark is therefore straight-forward and requires only 5 reconstructions per task. Leaderboard and benchmark submission: Benchmark data access and toolkit: Great work by Tobias Kirschstein Simon Giebenhain

Matthias Niessner

28,062 Aufrufe • vor 1 Jahr

Meta releases VGGSfM Visual Geometry Grounded Deep Structure From Motion Structure-from-motion (SfM) is a long-standing problem in the computer vision community, which aims to reconstruct the camera poses and 3D structure of a scene from a set of unconstrained 2D images. Classical frameworks solve this problem in an incremental manner by detecting and matching keypoints, registering images, triangulating 3D points, and conducting bundle adjustment. Recent research efforts have predominantly revolved around harnessing the power of deep learning techniques to enhance specific elements (e.g., keypoint matching), but are still based on the original, non-differentiable pipeline. Instead, we propose a new deep SfM pipeline VGGSfM, where each component is fully differentiable and thus can be trained in an end-to-end manner. To this end, we introduce new mechanisms and simplifications. First, we build on recent advances in deep 2D point tracking to extract reliable pixel-accurate tracks, which eliminates the need for chaining pairwise matches. Furthermore, we recover all cameras simultaneously based on the image and track features instead of gradually registering cameras. Finally, we optimise the cameras and triangulate 3D points via a differentiable bundle adjustment layer. We attain state-of-the-art performance on three popular datasets, CO3D, IMC Phototourism, and ETH3D.

Meta releases VGGSfM Visual Geometry Grounded Deep Structure From Motion Structure-from-motion (SfM) is a long-standing problem in the computer vision community, which aims to reconstruct the camera poses and 3D structure of a scene from a set of unconstrained 2D images. Classical frameworks solve this problem in an incremental manner by detecting and matching keypoints, registering images, triangulating 3D points, and conducting bundle adjustment. Recent research efforts have predominantly revolved around harnessing the power of deep learning techniques to enhance specific elements (e.g., keypoint matching), but are still based on the original, non-differentiable pipeline. Instead, we propose a new deep SfM pipeline VGGSfM, where each component is fully differentiable and thus can be trained in an end-to-end manner. To this end, we introduce new mechanisms and simplifications. First, we build on recent advances in deep 2D point tracking to extract reliable pixel-accurate tracks, which eliminates the need for chaining pairwise matches. Furthermore, we recover all cameras simultaneously based on the image and track features instead of gradually registering cameras. Finally, we optimise the cameras and triangulate 3D points via a differentiable bundle adjustment layer. We attain state-of-the-art performance on three popular datasets, CO3D, IMC Phototourism, and ETH3D.

AK

96,527 Aufrufe • vor 1 Jahr

We are excited to introduce Stable Fast 3D, Stability AI’s latest breakthrough in 3D asset generation technology. This innovative model transforms a single input image into a detailed 3D asset in just 0.5 seconds, setting a new standard for speed and quality in the field of 3D reconstruction! Alongside this release, we’ve also published a technical report that highlights how we achieve fast inference speeds with reduced baked illumination and material parameters. 👾You can learn more and access the report here:

We are excited to introduce Stable Fast 3D, Stability AI’s latest breakthrough in 3D asset generation technology. This innovative model transforms a single input image into a detailed 3D asset in just 0.5 seconds, setting a new standard for speed and quality in the field of 3D reconstruction! Alongside this release, we’ve also published a technical report that highlights how we achieve fast inference speeds with reduced baked illumination and material parameters. 👾You can learn more and access the report here:

Stability AI

438,350 Aufrufe • vor 1 Jahr

(1/2) We released our Neural Parametric Head Models (NPHM) dataset from our #CVPR2023 paper! It includes over 5600 high-fidelity 3D scans of human heads from 272 subjects - all publicly available! Check it out!

(1/2) We released our Neural Parametric Head Models (NPHM) dataset from our #CVPR2023 paper! It includes over 5600 high-fidelity 3D scans of human heads from 272 subjects - all publicly available! Check it out!

Matthias Niessner

36,014 Aufrufe • vor 3 Jahren

The term "continual learning" has become overloaded if you see it as an ML problem. One classic thread is about memorization: regularization-based continual learning methods, such as EWC, MAS, and SI, estimate which parameters mattered for previous tasks and resist changing them too much. One modern thread is about adaptation: test-time training and inference-time learning methods, such as TTT, adapt part of the model on the incoming test stream before making predictions. These are sometimes discussed as separate threads. But in modern scalable architectures, I think they are better seen as complementary constraints: a model that learns quickly at test time also benefits from a mechanism for deciding what not to forget. In our #ECCV2026 paper, we study this in large-scale 4D reconstruction: how to build fast spatial memory that can adapt over long observation streams while reducing collapse and forgetting. Instead of using fully plastic test-time updates, we stabilize fast-weight adaptation with an elastic prior that balances adaptation and memory. Key ideas: - Elastic Test-Time Training: Fisher-weighted consolidation for fast-weight updates - EMA anchor weights that provide a moving reference for stability - Chunk-by-chunk inference for long 3D/4D observation streams We show that this scales across large 3D/4D pretraining settings, including both LRM-style and LVSM-style models, and improves reconstruction across benchmarks including Stereo4D, NVIDIA, and DL3DV-140. We release model checkpoints across different design choices: resolution, post-training curriculum, and whether the model uses an explicit 4DGS intermediate representation. - Homepage: - Paper: - Code: - Models: This work is co-led with Xueyang Yu, contributed by Haoyu Zhen Yuncong Yang, and advised by Michigan SLED Lab Chuang Gan.

The term "continual learning" has become overloaded if you see it as an ML problem. One classic thread is about memorization: regularization-based continual learning methods, such as EWC, MAS, and SI, estimate which parameters mattered for previous tasks and resist changing them too much. One modern thread is about adaptation: test-time training and inference-time learning methods, such as TTT, adapt part of the model on the incoming test stream before making predictions. These are sometimes discussed as separate threads. But in modern scalable architectures, I think they are better seen as complementary constraints: a model that learns quickly at test time also benefits from a mechanism for deciding what not to forget. In our #ECCV2026 paper, we study this in large-scale 4D reconstruction: how to build fast spatial memory that can adapt over long observation streams while reducing collapse and forgetting. Instead of using fully plastic test-time updates, we stabilize fast-weight adaptation with an elastic prior that balances adaptation and memory. Key ideas: - Elastic Test-Time Training: Fisher-weighted consolidation for fast-weight updates - EMA anchor weights that provide a moving reference for stability - Chunk-by-chunk inference for long 3D/4D observation streams We show that this scales across large 3D/4D pretraining settings, including both LRM-style and LVSM-style models, and improves reconstruction across benchmarks including Stereo4D, NVIDIA, and DL3DV-140. We release model checkpoints across different design choices: resolution, post-training curriculum, and whether the model uses an explicit 4DGS intermediate representation. - Homepage: - Paper: - Code: - Models: This work is co-led with Xueyang Yu, contributed by Haoyu Zhen Yuncong Yang, and advised by Michigan SLED Lab Chuang Gan.

Martin Ziqiao Ma

31,958 Aufrufe • vor 11 Tagen

Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction Note: Check below for full video. Abstract (cited): "In this paper, we present a self-calibrating framework that jointly optimizes camera parameters, lens distortion, and 3D Gaussian representations, enabling accurate and efficient scene reconstruction. Our technique is particularly effective for high-quality scene reconstruction from large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of images. We introduce a novel method for modeling complex lens distortions using a hybrid network that combines invertible residual networks with explicit grids. This design effectively regularizes the optimization process, achieving greater accuracy than conventional camera models. Additionally, we propose a cubemap-based resampling strategy to support large FOV images without sacrificing resolution or introducing distortion artifacts. Our method is compatible with the fast rasterization of Gaussian Splatting, adaptable to a wide variety of camera lens distortions, and demonstrates state-of-the-art performance on both synthetic and real-world datasets."

Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction Note: Check below for full video. Abstract (cited): "In this paper, we present a self-calibrating framework that jointly optimizes camera parameters, lens distortion, and 3D Gaussian representations, enabling accurate and efficient scene reconstruction. Our technique is particularly effective for high-quality scene reconstruction from large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of images. We introduce a novel method for modeling complex lens distortions using a hybrid network that combines invertible residual networks with explicit grids. This design effectively regularizes the optimization process, achieving greater accuracy than conventional camera models. Additionally, we propose a cubemap-based resampling strategy to support large FOV images without sacrificing resolution or introducing distortion artifacts. Our method is compatible with the fast rasterization of Gaussian Splatting, adaptable to a wide variety of camera lens distortions, and demonstrates state-of-the-art performance on both synthetic and real-world datasets."

MrNeRF

17,206 Aufrufe • vor 1 Jahr

CVPR Highlight #1 VGGT: Visual Geometry Grounded Transformer (best paper award) 🚀 Super-fast 3D scene reconstruction 🤗 Model and demo on Hugging Face The result is a GLB point cloud 👇

CVPR Highlight #1 VGGT: Visual Geometry Grounded Transformer (best paper award) 🚀 Super-fast 3D scene reconstruction 🤗 Model and demo on Hugging Face The result is a GLB point cloud 👇

dylan

19,325 Aufrufe • vor 1 Jahr

EDGS: Eliminating Densification for Efficient Convergence of 3DGS Contributions: • We show that initial triangulation based on 2D correspondences can replace the incremental refinement process, fundamentally changing how 3DGS models allocate resources. • Our method reduces the path each Gaussian must travel in parameter space. Careful initialization not only accelerates convergence but also guides optimization toward a convergence point corresponding to lower reconstruction error and thus higher reconstruction quality. • Our approach outperforms both speed-optimized and quality-focused state-of-the-art models while using only half the splats of standard 3DGS. By improving initialization rather than altering the optimization process, this method is compatible with other 3DGS acceleration techniques, making it a flexible enhancement to existing models.

EDGS: Eliminating Densification for Efficient Convergence of 3DGS Contributions: • We show that initial triangulation based on 2D correspondences can replace the incremental refinement process, fundamentally changing how 3DGS models allocate resources. • Our method reduces the path each Gaussian must travel in parameter space. Careful initialization not only accelerates convergence but also guides optimization toward a convergence point corresponding to lower reconstruction error and thus higher reconstruction quality. • Our approach outperforms both speed-optimized and quality-focused state-of-the-art models while using only half the splats of standard 3DGS. By improving initialization rather than altering the optimization process, this method is compatible with other 3DGS acceleration techniques, making it a flexible enhancement to existing models.

MrNeRF

124,046 Aufrufe • vor 1 Jahr

📢📢📢 NoKSR: Kernel-Free Neural Surface Reconstruction via Point Cloud Serialization #3DV2025 TL;DR: neural reconstruction with a simpler architecture (no linear systems to solve), and up to 3x speedup vs. voxel-based methods!

📢📢📢 NoKSR: Kernel-Free Neural Surface Reconstruction via Point Cloud Serialization #3DV2025 TL;DR: neural reconstruction with a simpler architecture (no linear systems to solve), and up to 3x speedup vs. voxel-based methods!

Andrea Tagliasacchi @CVPR

16,962 Aufrufe • vor 1 Jahr

DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery Abstract: Drones have become essential tools for reconstructing wild scenes due to their outstanding maneuverability. Recent advances in radiance field methods have achieved remarkable rendering quality, providing a new avenue for 3D reconstruction from drone imagery. However, dynamic distractors in wild environments challenge the static scene assumption in radiance fields, while limited view constraints hinder the accurate capture of underlying scene geometry. To address these challenges, we introduce DroneSplat, a novel framework designed for robust 3D reconstruction from in-the-wild drone imagery. Our method adaptively adjusts masking thresholds by integrating local-global segmentation heuristics with statistical approaches, enabling precise identification and elimination of dynamic distractors in static scenes. We enhance 3D Gaussian Splatting with multi-view stereo predictions and a voxel-guided optimization strategy, supporting high-quality rendering under limited view constraints. For comprehensive evaluation, we provide a drone-captured 3D reconstruction dataset encompassing both dynamic and static scenes. Extensive experiments demonstrate that DroneSplat outperforms both 3DGS and NeRF baselines in handling in-the-wild drone imagery.

DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery Abstract: Drones have become essential tools for reconstructing wild scenes due to their outstanding maneuverability. Recent advances in radiance field methods have achieved remarkable rendering quality, providing a new avenue for 3D reconstruction from drone imagery. However, dynamic distractors in wild environments challenge the static scene assumption in radiance fields, while limited view constraints hinder the accurate capture of underlying scene geometry. To address these challenges, we introduce DroneSplat, a novel framework designed for robust 3D reconstruction from in-the-wild drone imagery. Our method adaptively adjusts masking thresholds by integrating local-global segmentation heuristics with statistical approaches, enabling precise identification and elimination of dynamic distractors in static scenes. We enhance 3D Gaussian Splatting with multi-view stereo predictions and a voxel-guided optimization strategy, supporting high-quality rendering under limited view constraints. For comprehensive evaluation, we provide a drone-captured 3D reconstruction dataset encompassing both dynamic and static scenes. Extensive experiments demonstrate that DroneSplat outperforms both 3DGS and NeRF baselines in handling in-the-wild drone imagery.

MrNeRF

21,346 Aufrufe • vor 1 Jahr

Create a 3D model from a single image, set of images or a text prompt in < 1 minute 😮‍💨 This new AI paper called CAT3D shows us that it’ll keep getting easier to produce 3D models from 2D images — whether it’s a sparser real world 3D scan (a few photos instead of hundreds) or your favorite 2D image generator like Midjourney (just an image). How does this magic work? “This architecture is similar to video diffusion models, but with camera pose embeddings for each image instead of time embeddings. The generated views are passed into a robust 3D reconstruction pipeline to create the 3D representation (Zip-NeRF or 3DGS)”

Create a 3D model from a single image, set of images or a text prompt in < 1 minute 😮‍💨 This new AI paper called CAT3D shows us that it’ll keep getting easier to produce 3D models from 2D images — whether it’s a sparser real world 3D scan (a few photos instead of hundreds) or your favorite 2D image generator like Midjourney (just an image). How does this magic work? “This architecture is similar to video diffusion models, but with camera pose embeddings for each image instead of time embeddings. The generated views are passed into a robust 3D reconstruction pipeline to create the 3D representation (Zip-NeRF or 3DGS)”

Bilawal Sidhu

92,792 Aufrufe • vor 2 Jahren

SPARK can create high-quality 3D face avatars from regular videos and track expressions and poses in real time. It improves the accuracy of 3D face reconstructions for tasks like aging, face swapping, and digital makeup. 6 examples:

SPARK can create high-quality 3D face avatars from regular videos and track expressions and poses in real time. It improves the accuracy of 3D face reconstructions for tasks like aging, face swapping, and digital makeup. 6 examples:

Dreaming Tulpa 🥓👑

193,764 Aufrufe • vor 1 Jahr

Hi3DGen 🔥 High-fidelity 3D geometry generation from a single image by leveraging normal maps as an intermediate representation

Hi3DGen 🔥 High-fidelity 3D geometry generation from a single image by leveraging normal maps as an intermediate representation

Gradio

50,996 Aufrufe • vor 1 Jahr

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers paper page: Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control. State-of-the-art approaches predominantly rely on diffusion models to accomplish these tasks. However, the computational demands of diffusion-based methods are substantial, often necessitating large-scale paired datasets for training, and therefore challenging the deployment in practical applications. This study addresses this challenge by breaking down the text-based video editing process into two separate stages. In the first stage, we leverage an existing text-to-image diffusion model to simultaneously edit a few keyframes without additional fine-tuning. In the second stage, we introduce an efficient model called MaskINT, which is built on non-autoregressive masked generative transformers and specializes in frame interpolation between the keyframes, benefiting from structural guidance provided by intermediate frames. Our comprehensive set of experiments illustrates the efficacy and efficiency of MaskINT when compared to other diffusion-based methodologies. This research offers a practical solution for text-based video editing and showcases the potential of non-autoregressive masked generative transformers in this domain.

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers paper page: Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control. State-of-the-art approaches predominantly rely on diffusion models to accomplish these tasks. However, the computational demands of diffusion-based methods are substantial, often necessitating large-scale paired datasets for training, and therefore challenging the deployment in practical applications. This study addresses this challenge by breaking down the text-based video editing process into two separate stages. In the first stage, we leverage an existing text-to-image diffusion model to simultaneously edit a few keyframes without additional fine-tuning. In the second stage, we introduce an efficient model called MaskINT, which is built on non-autoregressive masked generative transformers and specializes in frame interpolation between the keyframes, benefiting from structural guidance provided by intermediate frames. Our comprehensive set of experiments illustrates the efficacy and efficiency of MaskINT when compared to other diffusion-based methodologies. This research offers a practical solution for text-based video editing and showcases the potential of non-autoregressive masked generative transformers in this domain.

AK

25,449 Aufrufe • vor 2 Jahren

📢📢𝐍𝐞𝐑𝐒𝐞𝐦𝐛𝐥𝐞 𝐯𝟐 𝐃𝐚𝐭𝐚𝐬𝐞𝐭 𝐑𝐞𝐥𝐞𝐚𝐬𝐞📢📢 Head captures of 7.1MP from 16 cameras at 73fps: * More recordings (425 people) * Better color calibration * Convenient download scripts The new version of our dataset adds 156 participants for a total of 425 different people. In its entirety, the dataset provides now 65 million images from over 15 hours of diverse human facial expression performances. We improved the color consistency of the recorded images with a better color calibration procedure. As a result, 3D reconstructions with images from the NeRSemble dataset should become better and look more realistic. Finally, we made it much easier to download the recordings with our new download repository. It now just takes a single command to download all frontal hair shake videos of all participants or to download all recordings of a single participant. Check it out: Awesome work by Tobias Kirschstein Simon Giebenhain !!!

📢📢𝐍𝐞𝐑𝐒𝐞𝐦𝐛𝐥𝐞 𝐯𝟐 𝐃𝐚𝐭𝐚𝐬𝐞𝐭 𝐑𝐞𝐥𝐞𝐚𝐬𝐞📢📢 Head captures of 7.1MP from 16 cameras at 73fps: * More recordings (425 people) * Better color calibration * Convenient download scripts The new version of our dataset adds 156 participants for a total of 425 different people. In its entirety, the dataset provides now 65 million images from over 15 hours of diverse human facial expression performances. We improved the color consistency of the recorded images with a better color calibration procedure. As a result, 3D reconstructions with images from the NeRSemble dataset should become better and look more realistic. Finally, we made it much easier to download the recordings with our new download repository. It now just takes a single command to download all frontal hair shake videos of all participants or to download all recordings of a single participant. Check it out: Awesome work by Tobias Kirschstein Simon Giebenhain !!!

Matthias Niessner

12,089 Aufrufe • vor 1 Jahr