正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Champ Controllable and Consistent Human Image Animation with 3D Parametric Guidance In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion

AK

457,965 subscribers

194,356 次观看 • 2 年前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 条评论

AK 的头像

AK2 年前

guidance in curernt human generative techniques. The methodology utilizes the SMPL(Skinned Multi-Person Linear) model as the 3D human parametric model to establish a unified representation of body shape and pose. This facilitates the accurate capture of intricate human

AK 的头像

AK2 年前

geometry and motion characteristics from source videos. Specifically, we incorporate rendered depth images, normal maps, and semantic maps obtained from SMPL sequences, alongside skeleton-based motion guidance, to enrich the conditions to the latent diffusion model with

AK 的头像

AK2 年前

comprehensive 3D shape and detailed pose attributes. A multi-layer motion fusion module, integrating self-attention mechanisms, is employed to fuse the shape and motion latent representations in the spatial domain. By representing the 3D human parametric model as the motion

AK 的头像

AK2 年前

guidance, we can perform parametric shape alignment of the human body between the reference image and the source video motion. Experimental evaluations conducted on benchmark datasets demonstrate the methodology's superior ability to generate high-quality

AK 的头像

AK2 年前

human animations that accurately capture both pose and shape variations. Furthermore, our approach also exhibits superior generalization capabilities on the proposed wild dataset.

AK 的头像

AK2 年前

paper page:

AK 的头像

AK2 年前

daily papers:

Junming (Leo) Chen 的头像

Junming (Leo) Chen2 年前

Thanks @_akhaliq! More exciting results and details on our page: We have released part of our code and continue to update it on We are now trying our best to demo on HuggingFace ASAP. Keep an eye on us if you're interested.

BowtiedWhitebat + Read Pinned Tweet or NGMI 的头像

BowtiedWhitebat + Read Pinned Tweet or NGMI2 年前

kate middleton enter the chat...

Milla Millennial 的头像

Milla Millennial2 年前

Very cool. Less artifacts than the other models.

相关视频

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model with Gradio demo local demo: This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion. Despite achieving reasonable results, these approaches face challenges in maintaining temporal consistency throughout the animation due to the lack of temporal modeling and poor preservation of reference identity. In this work, we introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity. To achieve this, we first develop a video diffusion model to encode temporal information. Second, to maintain the appearance coherence across frames, we introduce a novel appearance encoder to retain the intricate details of the reference image. Leveraging these two innovations, we further employ a simple video fusion technique to encourage smooth transitions for long video animation. Empirical results demonstrate the superiority of our method over baseline approaches on two benchmarks. Notably, our approach outperforms the strongest baseline by over 38% in terms of video fidelity on the challenging TikTok dancing dataset. Code and model will be made available.

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model with Gradio demo local demo: This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion. Despite achieving reasonable results, these approaches face challenges in maintaining temporal consistency throughout the animation due to the lack of temporal modeling and poor preservation of reference identity. In this work, we introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity. To achieve this, we first develop a video diffusion model to encode temporal information. Second, to maintain the appearance coherence across frames, we introduce a novel appearance encoder to retain the intricate details of the reference image. Leveraging these two innovations, we further employ a simple video fusion technique to encourage smooth transitions for long video animation. Empirical results demonstrate the superiority of our method over baseline approaches on two benchmarks. Notably, our approach outperforms the strongest baseline by over 38% in terms of video fidelity on the challenging TikTok dancing dataset. Code and model will be made available.

AK

810,578 次观看 • 2 年前

(1/2) Excited to share "Learning Neural Parametric Head Models" #CVPR2023! We capture over 5200 high-quality 3D human head scans from which we build a neural parametric head model that disentangles & expressions and deformations.

(1/2) Excited to share "Learning Neural Parametric Head Models" #CVPR2023! We capture over 5200 high-quality 3D human head scans from which we build a neural parametric head model that disentangles & expressions and deformations.

Matthias Niessner

53,312 次观看 • 3 年前

🚀Turn Single Image into 3D Human🚀 #GeneMAN is a generalizable single-image 3D human reconstruction framework that turns in-the-wild images into high-quality 3D humans with ease 🔗Project: 📜Paper: 🧑‍💻Code:

🚀Turn Single Image into 3D Human🚀 #GeneMAN is a generalizable single-image 3D human reconstruction framework that turns in-the-wild images into high-quality 3D humans with ease 🔗Project: 📜Paper: 🧑‍💻Code:

Ziwei Liu

26,953 次观看 • 1 年前

We introduce W.A.L.T, a diffusion model for photorealistic video generation. Our model is a transformer trained on image and video generation in a shared latent space. 🧵👇

We introduce W.A.L.T, a diffusion model for photorealistic video generation. Our model is a transformer trained on image and video generation in a shared latent space. 🧵👇

Agrim Gupta

431,168 次观看 • 2 年前

There's a problem with 3D human pose & shape (HPS) estimation methods. You either get good 3D accuracy or good alignment with the image, but not both. Why? The current top methods use the wrong camera model. TokenHMR at #CVPR2024 analyzes the issue and presents a solution. (1/8)

There's a problem with 3D human pose & shape (HPS) estimation methods. You either get good 3D accuracy or good alignment with the image, but not both. Why? The current top methods use the wrong camera model. TokenHMR at #CVPR2024 analyzes the issue and presents a solution. (1/8)

Michael Black

80,500 次观看 • 2 年前

3D controllable video generation works well now and pixel alignment is impressive! We took our new 3D sculpt model, generated a blender scene and then asked a video-to-video model to render. Image to 3D sculpt: Common Sense Machines Synthetic rendering: Blender 🔶 Video-to-Video: Runway

3D controllable video generation works well now and pixel alignment is impressive! We took our new 3D sculpt model, generated a blender scene and then asked a video-to-video model to render. Image to 3D sculpt: Common Sense Machines Synthetic rendering: Blender 🔶 Video-to-Video: Runway

Common Sense Machines

18,714 次观看 • 1 年前

Today I'm presenting Omma Face Studio, a web-based MetaHuman editor with direct sync to your webcam. I developed it in Omma AI using Google's GNM engine: a complete 3D parametric human model, which I was able to adapt for Three.js.

Today I'm presenting Omma Face Studio, a web-based MetaHuman editor with direct sync to your webcam. I developed it in Omma AI using Google's GNM engine: a complete 3D parametric human model, which I was able to adapt for Three.js.

Joseph Azar

47,185 次观看 • 4 天前

An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion discuss: We introduce a new approach for generating realistic 3D models with UV maps through a representation termed "Object Images." This approach encapsulates surface geometry, appearance, and patch structures within a 64x64 pixel image, effectively converting complex 3D shapes into a more manageable 2D format. By doing so, we address the challenges of both geometric and semantic irregularity inherent in polygonal meshes. This method allows us to use image generation models, such as Diffusion Transformers, directly for 3D shape generation. Evaluated on the ABO dataset, our generated shapes with patch structures achieve point cloud FID comparable to recent 3D generative models, while naturally supporting PBR material generation.

An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion discuss: We introduce a new approach for generating realistic 3D models with UV maps through a representation termed "Object Images." This approach encapsulates surface geometry, appearance, and patch structures within a 64x64 pixel image, effectively converting complex 3D shapes into a more manageable 2D format. By doing so, we address the challenges of both geometric and semantic irregularity inherent in polygonal meshes. This method allows us to use image generation models, such as Diffusion Transformers, directly for 3D shape generation. Evaluated on the ABO dataset, our generated shapes with patch structures achieve point cloud FID comparable to recent 3D generative models, while naturally supporting PBR material generation.

AK

66,435 次观看 • 1 年前

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior paper page: present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior paper page: present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.

AK

161,530 次观看 • 2 年前

Alibaba released LHM! a new model that can generate high-quality, animatable 3D human avatars from a single image in just a few seconds

Alibaba released LHM! a new model that can generate high-quality, animatable 3D human avatars from a single image in just a few seconds

Dreaming Tulpa 🥓👑

21,853 次观看 • 1 年前

🔥 Introducing MVLift: Generate realistic 3D motion without any 3D training data - just using 2D poses from monocular videos! Applicable to human motion, human-object interaction & animal motion. Joint work w/ Jiajun Wu & Karen 💡 How? We reformulate 3D motion estimation as generating consistent multi-view 2D pose sequences. Our framework uses 2D motion diffusion to progressively establish multi-view consistency, requiring only single-view 2D pose sequences for training. Project: Video with demonstration: Paper:

🔥 Introducing MVLift: Generate realistic 3D motion without any 3D training data - just using 2D poses from monocular videos! Applicable to human motion, human-object interaction & animal motion. Joint work w/ Jiajun Wu & Karen 💡 How? We reformulate 3D motion estimation as generating consistent multi-view 2D pose sequences. Our framework uses 2D motion diffusion to progressively establish multi-view consistency, requiring only single-view 2D pose sequences for training. Project: Video with demonstration: Paper:

Jiaman Li

15,886 次观看 • 1 年前

Introducing Cube 2.0, a significant step towards controllable 3D game-world and video generation from any input. Powered by: 🌟 Image-to-3D in seconds w/ a new foundation model 🐎 Text-to-animation 🌎 AI rendering to create beautiful worlds (mesh+splats) Web app → Discord → Blog →

Introducing Cube 2.0, a significant step towards controllable 3D game-world and video generation from any input. Powered by: 🌟 Image-to-3D in seconds w/ a new foundation model 🐎 Text-to-animation 🌎 AI rendering to create beautiful worlds (mesh+splats) Web app → Discord → Blog →

Common Sense Machines

411,381 次观看 • 2 年前

In 2017, Gatorade created the “Water Man” using real water, not CGI, by syncing motion capture with 2,048 nozzles and strobe lights to form a 3D human shape frame by frame.

In 2017, Gatorade created the “Water Man” using real water, not CGI, by syncing motion capture with 2,048 nozzles and strobe lights to form a 3D human shape frame by frame.

Science girl

81,593 次观看 • 6 个月前

DiffSplat Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation DiffSplat is a generative framework to synthesize 3D Gaussian Splats from text prompts & single-view images in ⚡️ 1~2 seconds. It is fine-tuned directly from a pretrained text-to-image diffusion model.

DiffSplat Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation DiffSplat is a generative framework to synthesize 3D Gaussian Splats from text prompts & single-view images in ⚡️ 1~2 seconds. It is fine-tuned directly from a pretrained text-to-image diffusion model.

AK

38,416 次观看 • 1 年前

I’ve dreamt of creating a tool that could animate anyone with any motion from just ONE image… and now it’s a reality! 🎉 Super excited to introduce updated 3DHM: Synthesizing Moving People with 3D Control. 🕺💃3DHM can generate human videos from a single real or synthetic human image. #Animation #GenAI #AI #3DHM ✨ The magic of 3D control? Turning 2D pixels into lifelike, animated humans. 🎥 Check out our demo (and Merry Christmas)! Paper: Github: Webpage: Proudly working with the great Junming (Leo) Chen, , Yossi Gandelsman, Alyosha Efros and Jitendra MALIK😃 Kindly note: This video is intended solely for research purposes and is not authorized for commercial use.

I’ve dreamt of creating a tool that could animate anyone with any motion from just ONE image… and now it’s a reality! 🎉 Super excited to introduce updated 3DHM: Synthesizing Moving People with 3D Control. 🕺💃3DHM can generate human videos from a single real or synthetic human image. #Animation #GenAI #AI #3DHM ✨ The magic of 3D control? Turning 2D pixels into lifelike, animated humans. 🎥 Check out our demo (and Merry Christmas)! Paper: Github: Webpage: Proudly working with the great Junming (Leo) Chen, , Yossi Gandelsman, Alyosha Efros and Jitendra MALIK😃 Kindly note: This video is intended solely for research purposes and is not authorized for commercial use.

Boyi Li

52,482 次观看 • 1 年前

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors paper page: present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images.

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors paper page: present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images.

AK

305,663 次观看 • 3 年前

You can build your 3D models directly from a prompt into Claude code. All thanks to this beautiful python project, build123 a parametric modeling framework for 2D and 3D CAD. It is built on the Open Cascade geometric kernel and it is suitable for 3D printing.

You can build your 3D models directly from a prompt into Claude code. All thanks to this beautiful python project, build123 a parametric modeling framework for 2D and 3D CAD. It is built on the Open Cascade geometric kernel and it is suitable for 3D printing.

Marco Franzon

20,777 次观看 • 3 个月前