Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

🚀 Introducing InterDyn — our newly accepted CVPR work that explores controllable synthesis of interactive dynamics! Building upon powerful video diffusion models, InterDyn infers future motion and interactions directly from an input image and a dynamic control signal (e.g., a moving hand mask). Check out how we push the... show more

Haven (Haiwen) Feng

1,139 subscribers

44,898 görüntüleme • 1 yıl önce •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 Yorum

Haven (Haiwen) Feng profil fotoğrafı

Haven (Haiwen) Feng1 yıl önce

Dynamic Control & Beyond: Unlike prior methods that rely on explicit simulation or only static state transitions, InterDyn built a dynamic control branch on top of Stable Video Diffusion. We then fine-tune it to generate complex interactions (e.g. hand-object manipulations) and realistic multi-object collisions without heavy simulation computations. #StableDiffusion #StabilityAI 🧵2/6

Haven (Haiwen) Feng profil fotoğrafı

Haven (Haiwen) Feng1 yıl önce

Intuitive Physics: At its core, InterDyn showcases the diffusion model’s “knowledge” of real-world physics and causal effects. By simply providing a moving object mask, the system implicitly models collisions, force propagation, and object dynamics—no 3D reconstruction or separate physics engine needed. 🧵3/6

Haven (Haiwen) Feng profil fotoğrafı

Haven (Haiwen) Feng1 yıl önce

Superior Performance: We evaluate InterDyn on both synthetic (CLEVRER) and real-world datasets (Something-Something-v2), achieving up to 37.5% improvement on LPIPS and 77% on FVD over prior work. Whether it’s multi-object collisions or hand-object manipulations, InterDyn produces diverse and physically plausible videos. 🧵4/6

Haven (Haiwen) Feng profil fotoğrafı

Haven (Haiwen) Feng1 yıl önce

Toward Interactive Video Generation: This new perspective merges intuitive physics with large-scale generative models, opening the door to controllable dynamics synthesis in complex scenes. We believe InterDyn lays the groundwork for future explorations in interactive video generation. Stay tuned for more! 🧵5/6

Haven (Haiwen) Feng profil fotoğrafı

Haven (Haiwen) Feng1 yıl önce

This work was co-lead by me and our talented Master intern @rick_akker25502 (He's applying for PhD now, hire him!) together with amazing advisors, @Michael_J_Black , @dimtzionas and @vfabrevaya . More details & demos coming soon! See you in Nashville! #CVPR2025 #AI #ResearchPapers 🧵6/6

Adam profil fotoğrafı

Adam1 yıl önce

Great work! I had a similar idea for hand-object interaction with video generation but with 3D conditioning

Robert Scoble profil fotoğrafı

Robert Scoble1 yıl önce

Wow great work!

Kangfu Mei profil fotoğrafı

Kangfu Mei1 yıl önce

Very nice and creative work!

Daniel Sungho Jung profil fotoğrafı

Daniel Sungho Jung1 yıl önce

Interesting work! Were there any challenges during the research?

Erika S profil fotoğrafı

Erika S1 yıl önce

InterDyn’s approach to controllable synthesis of interactive dynamics is fascinating. I’m excited to see how it advances intuitive physics with video generative models—truly pushing boundaries in AI and computer vision!

Benzer Videolar

(1/2) LightIt: Illumination Modeling and Control for Diffusion Models! #CVPR2024 We facilitate lighting control for novel image generation from text prompts. We can also edit lighting for a given input image. Video: Project:

(1/2) LightIt: Illumination Modeling and Control for Diffusion Models! #CVPR2024 We facilitate lighting control for novel image generation from text prompts. We can also edit lighting for a given input image. Video: Project:

Matthias Niessner

19,835 görüntüleme • 2 yıl önce

Generative Novel View Synthesis with 3D-Aware Diffusion Models abs: project page:

Generative Novel View Synthesis with 3D-Aware Diffusion Models abs: project page:

AK

304,708 görüntüleme • 3 yıl önce

Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis

Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis

AK

28,101 görüntüleme • 11 ay önce

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048 abs: project page:

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048 abs: project page:

AK

718,682 görüntüleme • 3 yıl önce

I'm excited to share our new work Diffusion as Shader (DaS), a versatile controllable video generation method for various tasks: object manipulation, camera control, mesh-to-video, and motion transfer. Project page: Github:

I'm excited to share our new work Diffusion as Shader (DaS), a versatile controllable video generation method for various tasks: object manipulation, camera control, mesh-to-video, and motion transfer. Project page: Github:

Yuan Liu

33,363 görüntüleme • 1 yıl önce

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: Code and Models: Paper:

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: Code and Models: Paper:

Sherwin Bahmani

66,404 görüntüleme • 8 ay önce

VSTAR Generative Temporal Nursing for Longer Dynamic Video Synthesis Despite tremendous progress in the field of text-to-video (T2V) synthesis, open-sourced T2V diffusion models struggle to generate longer videos with dynamically varying and evolving content. They tend to

VSTAR Generative Temporal Nursing for Longer Dynamic Video Synthesis Despite tremendous progress in the field of text-to-video (T2V) synthesis, open-sourced T2V diffusion models struggle to generate longer videos with dynamically varying and evolving content. They tend to

AK

36,982 görüntüleme • 2 yıl önce

Glad that our work “Inference-Time Enhancement of Generative Robot Policies via Predictive World Modeling”, led by Han Qi, has been accepted to IEEE Robotics and Automation Letters! 🎉 We propose Generative Predictive Control (GPC): sample action proposals from a pretrained diffusion policy (“look back”), roll them out with a diffusion-based action-conditioned video world model (“look forward”), then rank or optimize the actions using either a learned reward model or VLM preferences. Conceptually, this is trajectory optimization / MPC with hybrid sampling + gradient optimization, interpreted through modern diffusion priors and video world models. Interestingly, we first posted the paper on arXiv in Feb 2025, when action-conditioned video world models for planning were still rare—now this direction is rapidly gaining traction. Still many open questions, e.g., • how to avoid local minima in planning • what representations work best for world models • how to balance physics priors vs. data-driven learning Paper:

Glad that our work “Inference-Time Enhancement of Generative Robot Policies via Predictive World Modeling”, led by Han Qi, has been accepted to IEEE Robotics and Automation Letters! 🎉 We propose Generative Predictive Control (GPC): sample action proposals from a pretrained diffusion policy (“look back”), roll them out with a diffusion-based action-conditioned video world model (“look forward”), then rank or optimize the actions using either a learned reward model or VLM preferences. Conceptually, this is trajectory optimization / MPC with hybrid sampling + gradient optimization, interpreted through modern diffusion priors and video world models. Interestingly, we first posted the paper on arXiv in Feb 2025, when action-conditioned video world models for planning were still rare—now this direction is rapidly gaining traction. Still many open questions, e.g., • how to avoid local minima in planning • what representations work best for world models • how to balance physics priors vs. data-driven learning Paper:

Heng Yang

18,968 görüntüleme • 3 ay önce

Scaling up GANs for Text-to-Image Synthesis present our 1B-parameter GigaGAN, achieving lower FID than Stable Diffusion v1.5, DALL·E 2, and Parti-750M. It generates 512px outputs at 0.13s, orders of magnitude faster than diffusion and autoregressive models, and inherits the disentangled, continuous, and controllable latent space of GANs abs: project page:

Scaling up GANs for Text-to-Image Synthesis present our 1B-parameter GigaGAN, achieving lower FID than Stable Diffusion v1.5, DALL·E 2, and Parti-750M. It generates 512px outputs at 0.13s, orders of magnitude faster than diffusion and autoregressive models, and inherits the disentangled, continuous, and controllable latent space of GANs abs: project page:

AK

278,115 görüntüleme • 3 yıl önce

Dreamix: Video Diffusion Models are General Video Editors abs: project page: present diffusion-based method that is able to perform text-based motion and appearance editing of general videos

Dreamix: Video Diffusion Models are General Video Editors abs: project page: present diffusion-based method that is able to perform text-based motion and appearance editing of general videos

AK

398,132 görüntüleme • 3 yıl önce

👀 Pixel perfect 💎✨ 🖼️ Edify Image from #NVIDIAResearch is a family of diffusion models that supports a wide range of applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360° HDR panorama generation, and finetuning for image customization. 🧵 1/2

👀 Pixel perfect 💎✨ 🖼️ Edify Image from #NVIDIAResearch is a family of diffusion models that supports a wide range of applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360° HDR panorama generation, and finetuning for image customization. 🧵 1/2

NVIDIA AI Developer

14,747 görüntüleme • 1 yıl önce

Today we're introducing Gen-4, our new series of state-of-the-art AI models for media generation and world consistency. Gen-4 is a significant step forward for fidelity, dynamic motion and controllability in generative media. Gen-4 Image-to-Video is rolling out today to all paid plans and Enterprise customers. 1/8

Today we're introducing Gen-4, our new series of state-of-the-art AI models for media generation and world consistency. Gen-4 is a significant step forward for fidelity, dynamic motion and controllability in generative media. Gen-4 Image-to-Video is rolling out today to all paid plans and Enterprise customers. 1/8

Runway

736,076 görüntüleme • 1 yıl önce

I finally released my new video on YouTube about Diffusion Models / Score-Based Generative Models. Literally planned this for a year and put so much work in. I think this approach to diffusion models is so intuitive and highly recommend giving that a go! Video is 38min long, so you will need some time to watch that haha.

I finally released my new video on YouTube about Diffusion Models / Score-Based Generative Models. Literally planned this for a year and put so much work in. I think this approach to diffusion models is so intuitive and highly recommend giving that a go! Video is 38min long, so you will need some time to watch that haha.

dome | Outlier

54,540 görüntüleme • 1 yıl önce

Punchline: World models == VQA (about the future)! Planning with world models can be powerful for robotics/control. But most world models are video generators trained to predict everything, including irrelevant pixels and distractions. We ask - what if a world model only predicted the semantic information necessary for decision-making? Introducing Semantic World Models (SWM). Given an observation and an action sequence, SWMs cast modeling as answering textual questions about the future outcome resulting from the actions. Recasting world modeling as a VQA problem lets us directly leverage the pretrained knowledge and machinery of VLMs for generalizable modeling. We had a lot of fun thinking about how this work helps connect these two seemingly very different fields of study - VLMs and world models! 🧵(1/6) Paper: Fun demo:

Punchline: World models == VQA (about the future)! Planning with world models can be powerful for robotics/control. But most world models are video generators trained to predict everything, including irrelevant pixels and distractions. We ask - what if a world model only predicted the semantic information necessary for decision-making? Introducing Semantic World Models (SWM). Given an observation and an action sequence, SWMs cast modeling as answering textual questions about the future outcome resulting from the actions. Recasting world modeling as a VQA problem lets us directly leverage the pretrained knowledge and machinery of VLMs for generalizable modeling. We had a lot of fun thinking about how this work helps connect these two seemingly very different fields of study - VLMs and world models! 🧵(1/6) Paper: Fun demo:

Abhishek Gupta

61,121 görüntüleme • 7 ay önce

We present Neural Riemannian Motion Fields (NRMF) as a promising paradigm for generative modeling of articulated motion. Unlike various diffusion based models, NRMF is useful both in generation and in deployment as a prior across tasks: #CVPR #AI #CVML👇

We present Neural Riemannian Motion Fields (NRMF) as a promising paradigm for generative modeling of articulated motion. Unlike various diffusion based models, NRMF is useful both in generation and in deployment as a prior across tasks: #CVPR #AI #CVML👇

Tolga Birdal

16,624 görüntüleme • 8 ay önce

🚀Excited to introduce GEN3C #CVPR2025, a generative video model with an explicit 3D cache for precise camera control. 🎥It applies to multiple use cases, including single-view and sparse-view NVS🖼️ and challenging settings like monocular dynamic NVS and driving simulation🚗. Project page:

🚀Excited to introduce GEN3C #CVPR2025, a generative video model with an explicit 3D cache for precise camera control. 🎥It applies to multiple use cases, including single-view and sparse-view NVS🖼️ and challenging settings like monocular dynamic NVS and driving simulation🚗. Project page:

Xuanchi Ren

59,920 görüntüleme • 1 yıl önce

Netflix just dropped Go-with-the-Flow Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Netflix just dropped Go-with-the-Flow Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

AK

95,542 görüntüleme • 1 yıl önce

(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.

(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.

Yunzhi Zhang

48,871 görüntüleme • 1 yıl önce

1/ 🚀 Introducing AIDO.StructureDiffusion: A generative model for structural protein design—enabling high-quality, controllable generation of monomers, complexes, and antibodies. 🧵

1/ 🚀 Introducing AIDO.StructureDiffusion: A generative model for structural protein design—enabling high-quality, controllable generation of monomers, complexes, and antibodies. 🧵

GenBio AI

918,205 görüntüleme • 10 ay önce

📢 Introducing DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models Compared to vanilla DPO, we improve paired data construction and preference label granularity, leading to better visual quality and motion strength with only 1/3 of the data. 🧵

📢 Introducing DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models Compared to vanilla DPO, we improve paired data construction and preference label granularity, leading to better visual quality and motion strength with only 1/3 of the data. 🧵

Ziyi Wu

35,402 görüntüleme • 1 yıl önce