Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Google presents CAT4D Create Anything in 4D with Multi-View Video Diffusion Models

AK

415,531 subscribers

61,949 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

9 Kommentare

Profilbild von AK

AKvor 1 Jahr

discuss:

Profilbild von Rundi Wu

Rundi Wuvor 1 Jahr

Thanks for sharing our work! Project page: arXiv:

Profilbild von BensenHsu

BensenHsuvor 1 Jahr

The paper presents a method called CAT4D (Create Anything in 4D) that can generate high-quality dynamic 3D scenes from a single input monocular video. The key idea is to leverage a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis at any specified camera poses and timestamps. The authors evaluate their method on various tasks, including novel view synthesis, sparse-view static 3D reconstruction in the presence of scene motion, and 4D reconstruction from monocular videos. They show that their method can generate high-quality dynamic 3D scenes and outperforms existing state-of-the-art models that depend on multiple priors and external sources of information. full paper:

Profilbild von HistoricTechOmar Samir

HistoricTechOmar Samirvor 1 Jahr

CAT4D? More like create anything in 4D and amaze me!

Profilbild von Daveheardt

Daveheardtvor 1 Jahr

4D? Like 4 dimensions? If so - this is not it, this is 3D.

Profilbild von plugbrain

plugbrainvor 1 Jahr

Any chance of a code release?

Profilbild von Zero Vertex

Zero Vertexvor 1 Jahr

I wish my cat could bake like that. jk I don't have a cat

Profilbild von Fleeber

Fleebervor 1 Jahr

oooo

Profilbild von RinGo_3.0

RinGo_3.0vor 1 Jahr

👀

Ähnliche Videos

Google presents LightLab Controlling Light Sources in Images with Diffusion Models

Google presents LightLab Controlling Light Sources in Images with Diffusion Models

AK

122,958 Aufrufe • vor 1 Jahr

Diffuman4D 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Diffuman4D 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

AK

12,518 Aufrufe • vor 11 Monaten

We’ve upgraded Stable Video Diffusion 4D to Stable Video 4D 2.0 (SV4D 2.0), improving the quality of 4D outputs generated from a single object-centric video. While 3D provides a static view of an object’s shape and size; 4D extends this by including time, showing how the object moves. This multi-view video diffusion model generates a 4D output in three steps: 1️⃣ Starts with an input video of a moving person or object 2️⃣ Generates novel views of the subject from unseen angles 3️⃣ Constructs a single dynamic 4D video output with spatial and temporal consistency You can learn more here: (1/4)

We’ve upgraded Stable Video Diffusion 4D to Stable Video 4D 2.0 (SV4D 2.0), improving the quality of 4D outputs generated from a single object-centric video. While 3D provides a static view of an object’s shape and size; 4D extends this by including time, showing how the object moves. This multi-view video diffusion model generates a 4D output in three steps: 1️⃣ Starts with an input video of a moving person or object 2️⃣ Generates novel views of the subject from unseen angles 3️⃣ Constructs a single dynamic 4D video output with spatial and temporal consistency You can learn more here: (1/4)

Stability AI

35,974 Aufrufe • vor 1 Jahr

InsertAnywhere Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

InsertAnywhere Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

AK

26,123 Aufrufe • vor 5 Monaten

Nvidia presents Articulated Kinematics Distillation from Video Diffusion Models

Nvidia presents Articulated Kinematics Distillation from Video Diffusion Models

AK

39,189 Aufrufe • vor 1 Jahr

🌟 Create anything in 3D! 🌟 Introducing CAT3D: a new method that generates high-fidelity 3D scenes from any number of real or generated images in one minute, powered by multi-view diffusion models. w/ lovely coauthors Aleksander Holynski, Ben Poole and an amazing team!

🌟 Create anything in 3D! 🌟 Introducing CAT3D: a new method that generates high-fidelity 3D scenes from any number of real or generated images in one minute, powered by multi-view diffusion models. w/ lovely coauthors Aleksander Holynski, Ben Poole and an amazing team!

Ruiqi Gao

152,867 Aufrufe • vor 2 Jahren

Nvidia just announced Align Your Gaussians Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

Nvidia just announced Align Your Gaussians Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

AK

131,297 Aufrufe • vor 2 Jahren

Generative Novel View Synthesis with 3D-Aware Diffusion Models abs: project page:

Generative Novel View Synthesis with 3D-Aware Diffusion Models abs: project page:

AK

304,708 Aufrufe • vor 3 Jahren

Google presents VLOGGER Multimodal Diffusion for Embodied Avatar Synthesis We propose VLOGGER, a method for audio-driven human video generation from a single input image of a person, which builds on the success of recent generative diffusion models. Our method consists of

AK

66,375 Aufrufe • vor 2 Jahren

This is amazing! You can now create high-quality 3D Scenes from a single image using Multi-Instance Diffusion Models (MIDI) 🔥

This is amazing! You can now create high-quality 3D Scenes from a single image using Multi-Instance Diffusion Models (MIDI) 🔥

Gradio

41,770 Aufrufe • vor 1 Jahr

MVDream: Multi-view Diffusion for 3D Generation paper page: propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

MVDream: Multi-view Diffusion for 3D Generation paper page: propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

AK

294,442 Aufrufe • vor 2 Jahren

Most multi-view reconstruction models need full supervision. We show they can self-improve without any ground truth labels. Introducing SelfEvo: Self-Improving 4D Perception via Self-Distillation. Up to +36.5% in video depth, +20.1% in camera estimation, zero annotation.

Most multi-view reconstruction models need full supervision. We show they can self-improve without any ground truth labels. Introducing SelfEvo: Self-Improving 4D Perception via Self-Distillation. Up to +36.5% in video depth, +20.1% in camera estimation, zero annotation.

Qianqian Wang

24,309 Aufrufe • vor 2 Monaten

Meta presents Adaptive Caching for Faster Video Generation with Diffusion Transformers

Meta presents Adaptive Caching for Faster Video Generation with Diffusion Transformers

AK

53,119 Aufrufe • vor 1 Jahr

V3D Video Diffusion Models are Effective 3D Generators Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency

V3D Video Diffusion Models are Effective 3D Generators Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency

AK

31,997 Aufrufe • vor 2 Jahren

Robot Learning needs 4D world models! Robot Learning needs 4D world models! Robot Learning needs 4D world models! We introduce TesserAct, a 4D embodied world model that can simulate how agents interact with the 3D world over time! We achieve this by simply extending a pre-trained 2D video generation model to jointly predict RGB, depth, and surface normals. It enables: 1️⃣ Much better policy learning in the wild 2️⃣ Temporal + spatial coherence in 4D dynamic prediction 3️⃣ Novel view synthesis for embodied scenes Code: Paper Link: Project page:

Robot Learning needs 4D world models! Robot Learning needs 4D world models! Robot Learning needs 4D world models! We introduce TesserAct, a 4D embodied world model that can simulate how agents interact with the 3D world over time! We achieve this by simply extending a pre-trained 2D video generation model to jointly predict RGB, depth, and surface normals. It enables: 1️⃣ Much better policy learning in the wild 2️⃣ Temporal + spatial coherence in 4D dynamic prediction 3️⃣ Novel view synthesis for embodied scenes Code: Paper Link: Project page:

Chuang Gan

43,265 Aufrufe • vor 1 Jahr

Nvidia presents Align Your Steps Optimizing Sampling Schedules in Diffusion Models Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond. A crucial drawback of DMs is their slow sampling speed,

Nvidia presents Align Your Steps Optimizing Sampling Schedules in Diffusion Models Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond. A crucial drawback of DMs is their slow sampling speed,

AK

32,888 Aufrufe • vor 2 Jahren

Meta presents: Pippo : High-Resolution Multi-View Humans from a Single Image Generates 1K resolution, multi-view, studio-quality images from a single photo in a one forward pass

Meta presents: Pippo : High-Resolution Multi-View Humans from a Single Image Generates 1K resolution, multi-view, studio-quality images from a single photo in a one forward pass

Aran Komatsuzaki

32,503 Aufrufe • vor 1 Jahr