Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

📢Animating the Uncaptured 📢 We animate 3D humanoid meshes using video diffusion priors given a text prompt. 🎥 🌍 Realistic motion generation for 3D characters - without motion capture! 🚀 Great work by Marc Benedí Angela Dai

Matthias Niessner

47,520 subscribers

11,696 Aufrufe • vor 1 Jahr •via X (Twitter)

Bildung Wissenschaft & Technologie Kunst

Anya Rossi• Live Now

Private livecam show

3 Kommentare

Profilbild von Chaoyue Song

Chaoyue Songvor 1 Jahr

@marcbenedi @angelaqdai Really great work! Congratulations.

Profilbild von OPEN

OPENvor 1 Jahr

Cinematic pedigree of the highest order meets innovative AAA gameplay in OP3N. Dive into the action by wishlisting on Epic Games TODAY!

Profilbild von Derek Scherer | Dayruke

Derek Scherer | Dayrukevor 1 Jahr

@marcbenedi @angelaqdai Awesome results! Looks like the big innovations involve GenAI video (+ silhouette, etc.) providing inputs to the generated mesh animation. Right? (All new to me — had to look up MDM)

Ähnliche Videos

📢𝐋𝟑𝐃𝐆: 𝐋𝐚𝐭𝐞𝐧𝐭 𝟑𝐃 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧📢 #SIGGRAPHAsia We propose a generative diffusion model for 3D Gaussians. Key is a learnt latent space which substantially reduces the complexity of the diffusion process, thus facilitating room-scale scene generation! Great work by Barbara Roessle in with Norman Müller, Angela Dai, Lorenzo Porzi, Samuel Rota Bulò, Peter Kontschieder!

📢𝐋𝟑𝐃𝐆: 𝐋𝐚𝐭𝐞𝐧𝐭 𝟑𝐃 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧📢 #SIGGRAPHAsia We propose a generative diffusion model for 3D Gaussians. Key is a learnt latent space which substantially reduces the complexity of the diffusion process, thus facilitating room-scale scene generation! Great work by Barbara Roessle in with Norman Müller, Angela Dai, Lorenzo Porzi, Samuel Rota Bulò, Peter Kontschieder!

Matthias Niessner

39,489 Aufrufe • vor 1 Jahr

📢WorldAgents: 3D worlds only from 2D image models - without any training! We propose an agentic approach with a Director (VLM) to plan the scene, a Generator (Flux or NanoBanana) for new views, and a Verifier (VLM) for selection / 3D consistency. -> High-fidelity 3D worlds from a single text prompt. What's remarkable: our agents find consistent views from 2D image models to obtain 3D-consistent worlds; this shows that image models contain world priors - agents just need to find them! Great work by Ziya Erkoç Angela Dai

📢WorldAgents: 3D worlds only from 2D image models - without any training! We propose an agentic approach with a Director (VLM) to plan the scene, a Generator (Flux or NanoBanana) for new views, and a Verifier (VLM) for selection / 3D consistency. -> High-fidelity 3D worlds from a single text prompt. What's remarkable: our agents find consistent views from 2D image models to obtain 3D-consistent worlds; this shows that image models contain world priors - agents just need to find them! Great work by Ziya Erkoç Angela Dai

Matthias Niessner

18,854 Aufrufe • vor 2 Monaten

🔥 Introducing MVLift: Generate realistic 3D motion without any 3D training data - just using 2D poses from monocular videos! Applicable to human motion, human-object interaction & animal motion. Joint work w/ Jiajun Wu & Karen 💡 How? We reformulate 3D motion estimation as generating consistent multi-view 2D pose sequences. Our framework uses 2D motion diffusion to progressively establish multi-view consistency, requiring only single-view 2D pose sequences for training. Project: Video with demonstration: Paper:

🔥 Introducing MVLift: Generate realistic 3D motion without any 3D training data - just using 2D poses from monocular videos! Applicable to human motion, human-object interaction & animal motion. Joint work w/ Jiajun Wu & Karen 💡 How? We reformulate 3D motion estimation as generating consistent multi-view 2D pose sequences. Our framework uses 2D motion diffusion to progressively establish multi-view consistency, requiring only single-view 2D pose sequences for training. Project: Video with demonstration: Paper:

Jiaman Li

15,788 Aufrufe • vor 1 Jahr

Introducing Move One, an easy way to capture and create 3D animations with just a phone #MadeWithMove 🎥 Capture motion anywhere with a single camera ⚡️ Get your data back in minutes 🤖 Animate 3D characters Try for free👇

Introducing Move One, an easy way to capture and create 3D animations with just a phone #MadeWithMove 🎥 Capture motion anywhere with a single camera ⚡️ Get your data back in minutes 🤖 Animate 3D characters Try for free👇

Move AI

67,525 Aufrufe • vor 2 Jahren

📢📢 𝐏𝐫𝐄𝐝𝐢𝐭𝐨𝐫𝟑𝐃: 𝐅𝐚𝐬𝐭 𝐚𝐧𝐝 𝐏𝐫𝐞𝐜𝐢𝐬𝐞 𝟑𝐃 𝐒𝐡𝐚𝐩𝐞 𝐄𝐝𝐢𝐭𝐢𝐧𝐠 📢📢 We propose a training-free 3D shape editing approach that rapidly and precisely edits the regions intended by the user and keeps the rest as is. Using a quickly brushed mask and a text prompt, we first apply multi-view editing in the 2D domain and then run our merging algorithm in the 3D feature space to ensure that the edited shape is loyal to the input shape. Project Page: Video: Great work by Ziya Erkoç Can Gümeli Chaoyang Wang Angela Dai Peter Wonka Hsin-Ying Lee Peiye Zhuang

📢📢 𝐏𝐫𝐄𝐝𝐢𝐭𝐨𝐫𝟑𝐃: 𝐅𝐚𝐬𝐭 𝐚𝐧𝐝 𝐏𝐫𝐞𝐜𝐢𝐬𝐞 𝟑𝐃 𝐒𝐡𝐚𝐩𝐞 𝐄𝐝𝐢𝐭𝐢𝐧𝐠 📢📢 We propose a training-free 3D shape editing approach that rapidly and precisely edits the regions intended by the user and keeps the rest as is. Using a quickly brushed mask and a text prompt, we first apply multi-view editing in the 2D domain and then run our merging algorithm in the 3D feature space to ensure that the edited shape is loyal to the input shape. Project Page: Video: Great work by Ziya Erkoç Can Gümeli Chaoyang Wang Angela Dai Peter Wonka Hsin-Ying Lee Peiye Zhuang

Matthias Niessner

13,091 Aufrufe • vor 1 Jahr

📢 Matrix-3D: Omnidirectional 3D World Generation Ever wanted to turn a single image or a text prompt into a massive, explorable 3D world? That’s exactly what Matrix-3D does.

📢 Matrix-3D: Omnidirectional 3D World Generation Ever wanted to turn a single image or a text prompt into a massive, explorable 3D world? That’s exactly what Matrix-3D does.

Skywork

30,371 Aufrufe • vor 10 Monaten

📢 Intrinsic Image Fusion for Multi-View 3D Material Reconstruction 📢 We combine generative material priors with inverse path tracing: 1) define a parametric texture space 2) fuse monocular predictions across views into consistent textures 3) optimize low-dimensional parameters for physically-grounded reconstructions. The results are relightable PBR textures for 3D scenes: check out the result on a real-world 3D scan from the ScanNet++ dataset! 🌍 🎥 Great work by Peter Kocsis Lukas Höllein!

📢 Intrinsic Image Fusion for Multi-View 3D Material Reconstruction 📢 We combine generative material priors with inverse path tracing: 1) define a parametric texture space 2) fuse monocular predictions across views into consistent textures 3) optimize low-dimensional parameters for physically-grounded reconstructions. The results are relightable PBR textures for 3D scenes: check out the result on a real-world 3D scan from the ScanNet++ dataset! 🌍 🎥 Great work by Peter Kocsis Lukas Höllein!

Matthias Niessner

19,833 Aufrufe • vor 5 Monaten

📢📢GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction📢📢 Reconstructing high-fidelity 3D scenes from sparse RGB input is hard. It needs a strong 3D prior! We reformulate multi-view scene reconstruction as conditional 3D generation over overlapping spatial chunks, lifting posed image features into a generative shape prior via 3D conditioning. As an example prior, we build on Trellis2, and train it such that its reconstruction is pixel aligned and matches from all views. GenRecon achieves unprecedented reconstruction quality from any sparse RGB input sequence, even from a phone capture. The reconstruction also includes PBR materials which facilitates relighting and virtual object insertion. Amazing work by Katharina Schmid, Nicolas von Lützow, Jozef, Angela Dai

📢📢GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction📢📢 Reconstructing high-fidelity 3D scenes from sparse RGB input is hard. It needs a strong 3D prior! We reformulate multi-view scene reconstruction as conditional 3D generation over overlapping spatial chunks, lifting posed image features into a generative shape prior via 3D conditioning. As an example prior, we build on Trellis2, and train it such that its reconstruction is pixel aligned and matches from all views. GenRecon achieves unprecedented reconstruction quality from any sparse RGB input sequence, even from a phone capture. The reconstruction also includes PBR materials which facilitates relighting and virtual object insertion. Amazing work by Katharina Schmid, Nicolas von Lützow, Jozef, Angela Dai

Matthias Niessner

17,088 Aufrufe • vor 17 Tagen

The best motion capture for female characters … she’s great

The best motion capture for female characters … she’s great

Lord Bebo

1,121,564 Aufrufe • vor 6 Monaten

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors paper page: present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images.

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors paper page: present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images.

AK

305,643 Aufrufe • vor 2 Jahren

AI can synthesize dynamic textures on 3D meshes without UV maps! MeshNCA is a model that can generate dynamic textures from images, text prompts, and motion vector fields.

AI can synthesize dynamic textures on 3D meshes without UV maps! MeshNCA is a model that can generate dynamic textures from images, text prompts, and motion vector fields.

Dreaming Tulpa 🥓👑

79,593 Aufrufe • vor 2 Jahren

📢📢📢 Excited to share our new work *Autonomous Character-Scene Interaction Synthesis from Text Instruction* (Siggraph Asia 24). It presents a unified model for flexible scene-conditioned motion generation given text, scene, trajectory conditions. The results with smooth interaction look very impressive! 📰Paper: Project: Code and data will be released soon.

📢📢📢 Excited to share our new work Autonomous Character-Scene Interaction Synthesis from Text Instruction (Siggraph Asia 24). It presents a unified model for flexible scene-conditioned motion generation given text, scene, trajectory conditions. The results with smooth interaction look very impressive! 📰Paper: Project: Code and data will be released soon.

Siyuan Huang

11,340 Aufrufe • vor 1 Jahr

MVDream: Multi-view Diffusion for 3D Generation paper page: propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

MVDream: Multi-view Diffusion for 3D Generation paper page: propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

AK

294,442 Aufrufe • vor 2 Jahren

Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation paper page: Recent advances in generative modeling have led to promising progress on synthesizing 3D human motion from text, with methods that can generate character animations from short prompts and specified durations. However, using a single text prompt as input lacks the fine-grained control needed by animators, such as composing multiple actions and defining precise durations for parts of the motion. To address this, we introduce the new problem of timeline control for text-driven motion synthesis, which provides an intuitive, yet fine-grained, input interface for users. Instead of a single prompt, users can specify a multi-track timeline of multiple prompts organized in temporal intervals that may overlap. This enables specifying the exact timings of each action and composing multiple actions in sequence or at overlapping intervals. To generate composite animations from a multi-track timeline, we propose a new test-time denoising method. This method can be integrated with any pre-trained motion diffusion model to synthesize realistic motions that accurately reflect the timeline. At every step of denoising, our method processes each timeline interval (text prompt) individually, subsequently aggregating the predictions with consideration for the specific body parts engaged in each action. Experimental comparisons and ablations validate that our method produces realistic motions that respect the semantics and timing of given text prompts.

Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation paper page: Recent advances in generative modeling have led to promising progress on synthesizing 3D human motion from text, with methods that can generate character animations from short prompts and specified durations. However, using a single text prompt as input lacks the fine-grained control needed by animators, such as composing multiple actions and defining precise durations for parts of the motion. To address this, we introduce the new problem of timeline control for text-driven motion synthesis, which provides an intuitive, yet fine-grained, input interface for users. Instead of a single prompt, users can specify a multi-track timeline of multiple prompts organized in temporal intervals that may overlap. This enables specifying the exact timings of each action and composing multiple actions in sequence or at overlapping intervals. To generate composite animations from a multi-track timeline, we propose a new test-time denoising method. This method can be integrated with any pre-trained motion diffusion model to synthesize realistic motions that accurately reflect the timeline. At every step of denoising, our method processes each timeline interval (text prompt) individually, subsequently aggregating the predictions with consideration for the specific body parts engaged in each action. Experimental comparisons and ablations validate that our method produces realistic motions that respect the semantics and timing of given text prompts.

AK

126,548 Aufrufe • vor 2 Jahren

Another great music video by Kirill Nong using Move AI Multi-Cam. 🔥 Our motion capture tech uses AI to make the creation of lifelike 3D animations super easy with absolutely amazing results. 🚀 Experience the magic of effortless animation today! ⚡️

Another great music video by Kirill Nong using Move AI Multi-Cam. 🔥 Our motion capture tech uses AI to make the creation of lifelike 3D animations super easy with absolutely amazing results. 🚀 Experience the magic of effortless animation today! ⚡️

Move AI

10,451 Aufrufe • vor 2 Jahren

Progressive Rendering Distillation released on Hugging Face Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data

Progressive Rendering Distillation released on Hugging Face Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data

AK

12,586 Aufrufe • vor 1 Jahr

📢📢📢 𝐀𝐂𝟑𝐃: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers TL;DR: for 3D camera control in generative video, it really helps knowing *which* part of your model you should mess with Internship by Sherwin Bahmani at Snap

📢📢📢 𝐀𝐂𝟑𝐃: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers TL;DR: for 3D camera control in generative video, it really helps knowing which part of your model you should mess with Internship by Sherwin Bahmani at Snap

Andrea Tagliasacchi 🇨🇦→✈️→🇺🇲 (@CVPR)

23,040 Aufrufe • vor 1 Jahr

animators are not needed anymore this 3D AI motion capture plugin can convert character movement from real video to 3D data and.. you can apply the motion to any 3D character.. link in comments

animators are not needed anymore this 3D AI motion capture plugin can convert character movement from real video to 3D data and.. you can apply the motion to any 3D character.. link in comments

el.cine

64,475 Aufrufe • vor 6 Monaten

Can we use video diffusion to generate 3D scenes? 𝐖𝐨𝐫𝐥𝐝𝐄𝐱𝐩𝐥𝐨𝐫𝐞𝐫 (#SIGGRAPHAsia25) creates fully-navigable scenes via autoregressive video generation. Text input -> 3DGS scene output & interactive rendering! 🌍 📽️

Can we use video diffusion to generate 3D scenes? 𝐖𝐨𝐫𝐥𝐝𝐄𝐱𝐩𝐥𝐨𝐫𝐞𝐫 (#SIGGRAPHAsia25) creates fully-navigable scenes via autoregressive video generation. Text input -> 3DGS scene output & interactive rendering! 🌍 📽️

Matthias Niessner

30,777 Aufrufe • vor 8 Monaten

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

AK

122,525 Aufrufe • vor 2 Jahren