Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

We also present another paper at @SIGGRAPH 2023 on neural implicit 3D Morphable Models that can be used to create a dynamic 3D avatar from a single in-the-wild image. (Lead author Connor Lin).

Koki Nagano

1,497 subscribers

12,775 Aufrufe • vor 3 Jahren •via X (Twitter)

Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

2 Kommentare

Profilbild von Koki Nagano

Koki Naganovor 3 Jahren

@connorzl Our method uses flexible implicit 3D representations for geometry/color but also learns explicit UV texture parametrization to allow intuitive texture editing as well as provides all intuitive controls like traditional 3DMMs. Paper and more results:

Profilbild von soyboy

soyboyvor 3 Jahren

@siggraph @connorzl Q. Is there anything in the pipeline that makes it face specific? With chair training data, can it produce chairs?

Ähnliche Videos

Image-Blaster is a Claude skill that can create an entire 3D environment from a single image. The special sauce here is that it also extracts key environment elements and converts them into their own separate 3D models. Full YT video:

Image-Blaster is a Claude skill that can create an entire 3D environment from a single image. The special sauce here is that it also extracts key environment elements and converts them into their own separate 3D models. Full YT video:

Matt Workman

25,206 Aufrufe • vor 2 Monaten

📢 A Recipe for Generating 3D Worlds From a Single Image 📢 Our recipe explains how existing generative models can be adapted with minimal training effort to generate 3D worlds from a single input image.

📢 A Recipe for Generating 3D Worlds From a Single Image 📢 Our recipe explains how existing generative models can be adapted with minimal training effort to generate 3D worlds from a single input image.

Katja Schwarz

13,970 Aufrufe • vor 1 Jahr

🚀Turn Single Image into 3D Human🚀 #GeneMAN is a generalizable single-image 3D human reconstruction framework that turns in-the-wild images into high-quality 3D humans with ease 🔗Project: 📜Paper: 🧑‍💻Code:

🚀Turn Single Image into 3D Human🚀 #GeneMAN is a generalizable single-image 3D human reconstruction framework that turns in-the-wild images into high-quality 3D humans with ease 🔗Project: 📜Paper: 🧑‍💻Code:

Ziwei Liu

26,953 Aufrufe • vor 1 Jahr

#ICCV2025 🤩3D world generation is cool, but it is cooler to play with the worlds using 3D actions 👆💨, and see what happens! — Introducing *WonderPlay*: Now you can create dynamic 3D scenes that respond to your 3D actions from a single image! Web: 🧵1/7

#ICCV2025 🤩3D world generation is cool, but it is cooler to play with the worlds using 3D actions 👆💨, and see what happens! — Introducing WonderPlay: Now you can create dynamic 3D scenes that respond to your 3D actions from a single image! Web: 🧵1/7

Hong-Xing (Koven) Yu

57,796 Aufrufe • vor 1 Jahr

Adobe is entering the image-to-3D game! Their new method, LRM, can create high-fidelity 3D meshes from a single image in just 5 seconds 🔥 It's trained on 1 million objects and is able to generate objects from real-world images and generative AI models.

Adobe is entering the image-to-3D game! Their new method, LRM, can create high-fidelity 3D meshes from a single image in just 5 seconds 🔥 It's trained on 1 million objects and is able to generate objects from real-world images and generative AI models.

Dreaming Tulpa 🥓👑

270,942 Aufrufe • vor 2 Jahren

This is amazing! You can now create high-quality 3D Scenes from a single image using Multi-Instance Diffusion Models (MIDI) 🔥

This is amazing! You can now create high-quality 3D Scenes from a single image using Multi-Instance Diffusion Models (MIDI) 🔥

Gradio

41,770 Aufrufe • vor 1 Jahr

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors paper page: present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images.

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors paper page: present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images.

AK

305,663 Aufrufe • vor 3 Jahren

Want to create an avatar from a single image? FlexAvatar is a transformer model that creates full 360°, high-quality, and expressive 3D head avatar from just a single portrait image in minutes. Real-time Demo: FlexAvatar's lightweight architecture allows both animation and rendering in real-time, enabling interactive user experiences. To create a new 3D head avatar, only one image is required, e.g., from a webcam. The final avatar is ready after 2 minutes. Architecture: Under the hood, FlexAvatar adopts a transformer-based encoder-decoder design. The encoder maps the input image onto a latent avatar space, while the decoder produces 3D Gaussian attribute maps by incorporating the animation signal via cross-attention. The model learns all facial animations directly from the data without relying on pre-built 3D face models. This equips the avatars with realistic facial expressions. The internal avatar latent space can be conveniently used to integrate additional observations of a person via fitting. This enables use-cases where more than one image of a person is available, e.g., from a phone scan of the person. We train jointly on 2D monocular videos and multi-view data. However, in monocular videos, the animation signal leaks the target viewpoint, causing the model to produce incomplete 3D heads. We call this phenomenon entanglement of driving signal and target viewpoint. To prevent entanglement, we introduce bias sinks. These are learnable tokens that indicate whether a training sample stems from a monocular or a multi-view dataset. During training, the model learns to produce incomplete 3D heads only when the monocular token is present. During inference, FlexAvatar then always uses the multi-view token for which the model has learned to produce complete 3D heads. This simple design allows to combine the generalizability from monocular data with the quality of multi-view data. FlexAvatar summary: - Input: Single-image, phone scan, or monocular video - Output: Full 360° head avatar - Expressive animations - Real-time rendering and animation - Generalization to any portrait - Create a new avatar in 2 minutes - Use bias sinks to combine 2D and 3D data 🏠 🌍 🎥 Great work by Tobias Kirschstein and Simon Giebenhain!

Matthias Niessner

95,991 Aufrufe • vor 7 Monaten

Current 3D generative models are slow and low quality. We present GRM, a large-scale model that reconstructs 3D Gaussians in 0.1s and generates high-quality 3D assets from text or single images in a few seconds. Demo: 1/4

Current 3D generative models are slow and low quality. We present GRM, a large-scale model that reconstructs 3D Gaussians in 0.1s and generates high-quality 3D assets from text or single images in a few seconds. Demo: 1/4

Gordon Wetzstein

19,210 Aufrufe • vor 2 Jahren

What if we can simulate an *interactive 3D world*, from a single image, in the wild, in real time? Introducing PointWorld-1B: a large pre-trained 3D world model that predicts env dynamics given RGB-D capture and robot actions. 🌐 from Stanford University NVIDIA

What if we can simulate an interactive 3D world, from a single image, in the wild, in real time? Introducing PointWorld-1B: a large pre-trained 3D world model that predicts env dynamics given RGB-D capture and robot actions. 🌐 from Stanford University NVIDIA

Wenlong Huang

274,633 Aufrufe • vor 6 Monaten

🚨 Paper Alert Our recent breakthrough CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image has been accepted by ACM SIGGRAPH 2025 Journal Track! CAST will change the way create scenes in 3D Art and Embody AI. 🚀Soon available at 👇Details

🚨 Paper Alert Our recent breakthrough CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image has been accepted by ACM SIGGRAPH 2025 Journal Track! CAST will change the way create scenes in 3D Art and Embody AI. 🚀Soon available at 👇Details

Deemos

34,110 Aufrufe • vor 1 Jahr

Google DeepMind just dropped Genie 2. AI can now create diverse, interactive 3D worlds from a single image or text. Gaming will never be the same. 10 wild examples: 1. Long video generation on the fly

Google DeepMind just dropped Genie 2. AI can now create diverse, interactive 3D worlds from a single image or text. Gaming will never be the same. 10 wild examples: 1. Long video generation on the fly

Min Choi

373,295 Aufrufe • vor 1 Jahr

2/ 3D world-building with NVIDIA NVIDIA used Edify AI to create a detailed 3D desert in minutes during a live demo Edify 3D generates editable 3D meshes from text or image prompts

2/ 3D world-building with NVIDIA NVIDIA used Edify AI to create a detailed 3D desert in minutes during a live demo Edify 3D generates editable 3D meshes from text or image prompts

Poonam Soni

107,018 Aufrufe • vor 1 Jahr

I prompted Omma to build a #threejs app with real dynamic lighting on a 3D mesh - generated from a single image, using Depth Anything v2 + Transformers from Hugging Face ⚡ Depth estimation in the browser 🫧 3D mesh from the depth map 💡 Dynamic lighting reacting to the geometry No manual code. Pure AI prompting. The future of 3D on the web feels wide open.

I prompted Omma to build a #threejs app with real dynamic lighting on a 3D mesh - generated from a single image, using Depth Anything v2 + Transformers from Hugging Face ⚡ Depth estimation in the browser 🫧 3D mesh from the depth map 💡 Dynamic lighting reacting to the geometry No manual code. Pure AI prompting. The future of 3D on the web feels wide open.

Joseph Azar

14,041 Aufrufe • vor 2 Monaten

3D AI is leveling up! Rodin 3D AI can create stunning, high-quality 3D models from just text or image inputs. And with its latest update, it can even generate 8K HDRI textures to bring your models to life. Check out the link in the comments!

3D AI is leveling up! Rodin 3D AI can create stunning, high-quality 3D models from just text or image inputs. And with its latest update, it can even generate 8K HDRI textures to bring your models to life. Check out the link in the comments!

el.cine

46,032 Aufrufe • vor 1 Jahr

EgoLifter Open-world 3D Segmentation for Egocentric Perception In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically

EgoLifter Open-world 3D Segmentation for Egocentric Perception In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically

AK

41,117 Aufrufe • vor 2 Jahren

📢WorldAgents: 3D worlds only from 2D image models - without any training! We propose an agentic approach with a Director (VLM) to plan the scene, a Generator (Flux or NanoBanana) for new views, and a Verifier (VLM) for selection / 3D consistency. -> High-fidelity 3D worlds from a single text prompt. What's remarkable: our agents find consistent views from 2D image models to obtain 3D-consistent worlds; this shows that image models contain world priors - agents just need to find them! Great work by Ziya Erkoç Angela Dai

📢WorldAgents: 3D worlds only from 2D image models - without any training! We propose an agentic approach with a Director (VLM) to plan the scene, a Generator (Flux or NanoBanana) for new views, and a Verifier (VLM) for selection / 3D consistency. -> High-fidelity 3D worlds from a single text prompt. What's remarkable: our agents find consistent views from 2D image models to obtain 3D-consistent worlds; this shows that image models contain world priors - agents just need to find them! Great work by Ziya Erkoç Angela Dai

Matthias Niessner

18,976 Aufrufe • vor 4 Monaten

3D editing is hard: you need to ground an image + instruction and generate a faithful 3D shape in one forward pass -- no test-time optimization. So, we steer pretrained image-to-3D representations to do text-guided 3D edits; no massive 3D edit-pair dataset needed. Key trap: the “no-edit” solution is a nasty local minimum. We fix it with preference optimization, pushing the model to actually edit. Steer3D is the second work that adapts alignment ideas from LLMs to the 3D modality. SAM 3D also used DPO to improve its 3D generations.

3D editing is hard: you need to ground an image + instruction and generate a faithful 3D shape in one forward pass -- no test-time optimization. So, we steer pretrained image-to-3D representations to do text-guided 3D edits; no massive 3D edit-pair dataset needed. Key trap: the “no-edit” solution is a nasty local minimum. We fix it with preference optimization, pushing the model to actually edit. Steer3D is the second work that adapts alignment ideas from LLMs to the 3D modality. SAM 3D also used DPO to improve its 3D generations.

Georgia Gkioxari

116,061 Aufrufe • vor 7 Monaten

🏗️ Build #3D scenes from a single input view based on the latest #NVIDIAResearch paper accepted at #ICCV2023. 👀 This 3D-aware #generativeAI model can build diverse and plausible views. research paper: project page:

🏗️ Build #3D scenes from a single input view based on the latest #NVIDIAResearch paper accepted at #ICCV2023. 👀 This 3D-aware #generativeAI model can build diverse and plausible views. research paper: project page:

NVIDIA AI Developer

113,879 Aufrufe • vor 3 Jahren

Excited to share our work on Neural Assets: a new method for enabling 3D asset-level control in image diffusion models – scalable & without any 3D inductive biases. Neural Assets goes beyond text or pixel-based control & provides an interface inspired by 3D graphics tools. 🧵

Excited to share our work on Neural Assets: a new method for enabling 3D asset-level control in image diffusion models – scalable & without any 3D inductive biases. Neural Assets goes beyond text or pixel-based control & provides an interface inspired by 3D graphics tools. 🧵

Thomas Kipf

97,924 Aufrufe • vor 2 Jahren