正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

📢📢📢 𝐀𝐂𝟑𝐃: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers TL;DR: for 3D camera control in generative video, it really helps knowing which part of your model you should mess with Internship by Sherwin Bahmani at Snap

Andrea Tagliasacchi 🇨🇦→✈️→🇺🇲 (@CVPR)

16,653 subscribers

23,040 次观看 • 1 年前 •via X (Twitter)

科学技术新闻政治教育

Anya Rossi• Live Now

Private livecam show

5 条评论

Andrea Tagliasacchi 🇨🇦 的头像

Andrea Tagliasacchi 🇨🇦1 年前

TL;DR (expanded): 1) "when" in the diffusion process you condition for camera matters (i.e. noise scheduler) 2) "how" in the diffusion process you condition for camera maters (i.e. architecture) 3) "what data" you give to your diffusion model to condition camera matters

Andrea Tagliasacchi 🇨🇦 的头像

Andrea Tagliasacchi 🇨🇦1 年前

Why, you ask? 1) camera motion is low-frequency... early denoising iterations deal with low-frequency content 2) early DiT blocks are enough to fine-tune for camera control... more and you lose quality 3) model needs to know what a static view of the dynamic world looks like

Andrea Tagliasacchi 🇨🇦 的头像

Andrea Tagliasacchi 🇨🇦1 年前

A shout to the collaborators @isskoro @guocheng_qian A. Siarohin @willimenapace @SergeyTulyakov at Snap and @DaveLindell at UofT.

Samarth Sinha 的头像

Samarth Sinha1 年前

@sherwinbahmani Congrats @sherwinbahmani !!

Abdullah Hamdi 的头像

Abdullah Hamdi1 年前

@sherwinbahmani Congrats to the team ! Amazing work

相关视频

(1/2) 📢📢𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧𝐀𝐯𝐚𝐭𝐚𝐫𝐬 📢📢 High-fidelity 3D head avatars with precise control over viewpoint, expression, and pose. -> Our parametric 3D model enables control & consistency + 2D diffusion makes it photoreal.

(1/2) 📢📢𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧𝐀𝐯𝐚𝐭𝐚𝐫𝐬 📢📢 High-fidelity 3D head avatars with precise control over viewpoint, expression, and pose. -> Our parametric 3D model enables control & consistency + 2D diffusion makes it photoreal.

Matthias Niessner

66,414 次观看 • 2 年前

📢Excited to be at #ICLR2025 for our paper: VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Poster: Thu 3-5:30 PM (#134) Website: Code: Also check out our #CVPR2025 follow-up AC3D:

📢Excited to be at #ICLR2025 for our paper: VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Poster: Thu 3-5:30 PM (#134) Website: Code: Also check out our #CVPR2025 follow-up AC3D:

Sherwin Bahmani

11,608 次观看 • 1 年前

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: Code and Models: Paper:

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: Code and Models: Paper:

Sherwin Bahmani

66,590 次观看 • 10 个月前

📢𝐋𝟑𝐃𝐆: 𝐋𝐚𝐭𝐞𝐧𝐭 𝟑𝐃 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧📢 #SIGGRAPHAsia We propose a generative diffusion model for 3D Gaussians. Key is a learnt latent space which substantially reduces the complexity of the diffusion process, thus facilitating room-scale scene generation! Great work by Barbara Roessle in with Norman Müller, Angela Dai, Lorenzo Porzi, Samuel Rota Bulò, Peter Kontschieder!

📢𝐋𝟑𝐃𝐆: 𝐋𝐚𝐭𝐞𝐧𝐭 𝟑𝐃 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧📢 #SIGGRAPHAsia We propose a generative diffusion model for 3D Gaussians. Key is a learnt latent space which substantially reduces the complexity of the diffusion process, thus facilitating room-scale scene generation! Great work by Barbara Roessle in with Norman Müller, Angela Dai, Lorenzo Porzi, Samuel Rota Bulò, Peter Kontschieder!

Matthias Niessner

39,496 次观看 • 1 年前

Full creative control is here in 3D-to-Video. 3D-to-Video lets you manage every detail, from objects to camera angles, with total precision. Consistency is guaranteed.

Full creative control is here in 3D-to-Video. 3D-to-Video lets you manage every detail, from objects to camera angles, with total precision. Consistency is guaranteed.

MeshyAI

11,345 次观看 • 9 个月前

Bring your stories to life with a 3D camera. Start with a single frame and turn it into a 3D scene you can move through, shot by shot. Control the camera. Set the pace. Shape the story.

Bring your stories to life with a 3D camera. Start with a single frame and turn it into a 3D scene you can move through, shot by shot. Control the camera. Set the pace. Shape the story.

Moonvalley

18,713 次观看 • 1 年前

One of the hardest things to achieve with AI is precise character motion. The new model by Kinetix, Kamo-1, is amazing at giving you far more control over your generations. It’s also the first 3D-conditioned model, so it understands the scene in 3D and gives you almost unlimited camera motion. Let me show you how to use it 👇

One of the hardest things to achieve with AI is precise character motion. The new model by Kinetix, Kamo-1, is amazing at giving you far more control over your generations. It’s also the first 3D-conditioned model, so it understands the scene in 3D and gives you almost unlimited camera motion. Let me show you how to use it 👇

Everett World

19,199 次观看 • 7 个月前

"YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting" TL;DR: a unified 3D Gaussian splatting model that reconstructs high-quality scene geometry and camera poses from unposed/uncalibrated images in a single forward pass.

"YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting" TL;DR: a unified 3D Gaussian splatting model that reconstructs high-quality scene geometry and camera poses from unposed/uncalibrated images in a single forward pass.

Alexandre Morgand

15,035 次观看 • 5 个月前

🚀Excited to introduce GEN3C #CVPR2025, a generative video model with an explicit 3D cache for precise camera control. 🎥It applies to multiple use cases, including single-view and sparse-view NVS🖼️ and challenging settings like monocular dynamic NVS and driving simulation🚗. Project page:

🚀Excited to introduce GEN3C #CVPR2025, a generative video model with an explicit 3D cache for precise camera control. 🎥It applies to multiple use cases, including single-view and sparse-view NVS🖼️ and challenging settings like monocular dynamic NVS and driving simulation🚗. Project page:

Xuanchi Ren

60,036 次观看 • 1 年前

V3D Video Diffusion Models are Effective 3D Generators Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency

V3D Video Diffusion Models are Effective 3D Generators Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency

AK

31,997 次观看 • 2 年前

🎥 Take Full Control of the Camera! 🚨 I got early access to the new feature Camera Control with T2V 01 - Director Model by Hailuo AI (MiniMax) the ultimate tool for full camera control in video generation! ✨ What’s exciting? Direct with ease: Use natural language to guide your camera exactly how you want Smooth transitions: Combine multiple moves seamlessly for cinematic perfection Here are 10 sample prompts 🔖Bookmark for later!

🎥 Take Full Control of the Camera! 🚨 I got early access to the new feature Camera Control with T2V 01 - Director Model by Hailuo AI (MiniMax) the ultimate tool for full camera control in video generation! ✨ What’s exciting? Direct with ease: Use natural language to guide your camera exactly how you want Smooth transitions: Combine multiple moves seamlessly for cinematic perfection Here are 10 sample prompts 🔖Bookmark for later!

MayorkingAI

26,819 次观看 • 1 年前

📢 PART 1__[The Making of an UnderClass] with Dr. Claud Anderson_ "Major societies in this country have ways of excluding you intentionally without you knowing it."

📢 PART 1__[The Making of an UnderClass] with Dr. Claud Anderson_ "Major societies in this country have ways of excluding you intentionally without you knowing it."

💪🏾Kelly👊🏿⚖-🏴‍☠"The_War" Counter-War

12,968 次观看 • 1 年前

this is the world's first 3D AI Studio Rendora AI can generate fully editable 3D avatar, place it in any 3D environment, make it talk with gesture, control camera angle and.. edit on its own editing timeline step by step tutorial:

el.cine

53,833 次观看 • 1 年前

📢Excited news from sudoAI (@sudoAI_): The interactive demo of our 3D Generative AI model is online! (alpha test for desktop, mobile / pad coming soon) ✨Transform images & text into stunning 3D models in just 60 seconds!🚀 Try it now! 👉

📢Excited news from sudoAI (@sudoAI_): The interactive demo of our 3D Generative AI model is online! (alpha test for desktop, mobile / pad coming soon) ✨Transform images & text into stunning 3D models in just 60 seconds!🚀 Try it now! 👉

Hao Su

17,515 次观看 • 2 年前

ComfyUI-Mesh2Motion 1.2.0 Custom FBX import is now built into ComfyUI-Mesh2Motion. Load any FBX 3D animation, combine it with professional camera presets, and take precise control over your AI video output.

ComfyUI-Mesh2Motion 1.2.0 Custom FBX import is now built into ComfyUI-Mesh2Motion. Load any FBX 3D animation, combine it with professional camera presets, and take precise control over your AI video output.

jtydhr88

13,754 次观看 • 2 个月前

Using 3D assets with Gen-4 References is a simple way to bring highly detailed and specific models into your generative workflows for even more consistency and control. To do so, simply provide a location plate, a quick comp of your 3D model in that space and a style reference.

Using 3D assets with Gen-4 References is a simple way to bring highly detailed and specific models into your generative workflows for even more consistency and control. To do so, simply provide a location plate, a quick comp of your 3D model in that space and a style reference.

Runway

119,208 次观看 • 1 年前

✨ Made a new mini feature on Photo AI: [ Grab from 3d model ] So the problem is we're at that stage in time (typical for AI) where image-to-3d models are not good enough but are fun to play with, but we know they'll be good enough in 1-2 years With [ Make 3d model ] you already can turn any Photo AI pic into a 3d model but it still looks hyper clunky and deformed, but it works! One cool idea I had to make that more useful and made now: Let people make a 3d model then change the view of the it with the 3d viewer, then press [ o ] and it grabs a frame of the 3d That image you can then [ Remix ] (img2img), and it becomes a real photo again and that in turn you can then turn into a video again with [ Make video ] So that essentially gives you a fully freeform camera position control to take photos with One thing I need to fix is the background/skybox, I kinda need to take the original photo and remove the person and just get the background for the 3d model viewer, in this case it should be white, but it's a start!

✨ Made a new mini feature on Photo AI: [ Grab from 3d model ] So the problem is we're at that stage in time (typical for AI) where image-to-3d models are not good enough but are fun to play with, but we know they'll be good enough in 1-2 years With [ Make 3d model ] you already can turn any Photo AI pic into a 3d model but it still looks hyper clunky and deformed, but it works! One cool idea I had to make that more useful and made now: Let people make a 3d model then change the view of the it with the 3d viewer, then press [ o ] and it grabs a frame of the 3d That image you can then [ Remix ] (img2img), and it becomes a real photo again and that in turn you can then turn into a video again with [ Make video ] So that essentially gives you a fully freeform camera position control to take photos with One thing I need to fix is the background/skybox, I kinda need to take the original photo and remove the person and just get the background for the 3d model viewer, in this case it should be white, but it's a start!

@levelsio

119,210 次观看 • 1 年前

Don’t get me wrong I like Forgotten Land but just being able to run around in 3D with full camera control is really making me wish we got something a little more open and less restrictive for Kirby’s first 3D game.

Don’t get me wrong I like Forgotten Land but just being able to run around in 3D with full camera control is really making me wish we got something a little more open and less restrictive for Kirby’s first 3D game.

TheJoshMan ✝️

432,968 次观看 • 8 个月前

📢📢 𝐏𝐞𝐫𝐜𝐇𝐞𝐚𝐝: 𝐏𝐞𝐫𝐜𝐞𝐩𝐭𝐮𝐚𝐥 𝐇𝐞𝐚𝐝 𝐌𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐒𝐢𝐧𝐠𝐥𝐞-𝐈𝐦𝐚𝐠𝐞 𝟑𝐃 𝐇𝐞𝐚𝐝 𝐑𝐞𝐜𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 & 𝐄𝐝𝐢𝐭𝐢𝐧𝐠📢📢 PercHead reconstructs realistic 3D heads from a single image and enables disentangled 3D editing via geometric controls and style inputs from images or text. At its core is a generalized 3D head decoder trained with perceptual supervision from DINOv2 and SAM 2.1. We find that our new perceptual loss formulation improves reconstruction fidelity compared to commonly-used methods such as LPIPS. Our trained reconstruction model is able to generate 3D-consistent heads from a single input image. Even with challenging side-view inputs, the model robustly infers missing regions for a coherent, high-fidelity output. In addition, our architecture seamlessly adapts to downstream tasks: by swapping the encoder, we can transform the model into a disentangled 3D editing pipeline. In this scenario, we can control geometry through - potentially hand-drawn - segmentation maps, and condition style via image or text prompt. We also provide an interactive GUI to enable the exploration of our editing pipeline. 🌍 📽️ Great work by Antonio Oroz and Tobias Kirschstein

📢📢 𝐏𝐞𝐫𝐜𝐇𝐞𝐚𝐝: 𝐏𝐞𝐫𝐜𝐞𝐩𝐭𝐮𝐚𝐥 𝐇𝐞𝐚𝐝 𝐌𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐒𝐢𝐧𝐠𝐥𝐞-𝐈𝐦𝐚𝐠𝐞 𝟑𝐃 𝐇𝐞𝐚𝐝 𝐑𝐞𝐜𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 & 𝐄𝐝𝐢𝐭𝐢𝐧𝐠📢📢 PercHead reconstructs realistic 3D heads from a single image and enables disentangled 3D editing via geometric controls and style inputs from images or text. At its core is a generalized 3D head decoder trained with perceptual supervision from DINOv2 and SAM 2.1. We find that our new perceptual loss formulation improves reconstruction fidelity compared to commonly-used methods such as LPIPS. Our trained reconstruction model is able to generate 3D-consistent heads from a single input image. Even with challenging side-view inputs, the model robustly infers missing regions for a coherent, high-fidelity output. In addition, our architecture seamlessly adapts to downstream tasks: by swapping the encoder, we can transform the model into a disentangled 3D editing pipeline. In this scenario, we can control geometry through - potentially hand-drawn - segmentation maps, and condition style via image or text prompt. We also provide an interactive GUI to enable the exploration of our editing pipeline. 🌍 📽️ Great work by Antonio Oroz and Tobias Kirschstein

Matthias Niessner

18,855 次观看 • 8 个月前