Loading video...

Video Failed to Load

Go Home

🚀 Introducing InterDyn — our newly accepted CVPR work that explores controllable synthesis of interactive dynamics! Building upon powerful video diffusion models, InterDyn infers future motion and interactions directly from an input image and a dynamic control signal (e.g., a moving hand mask). Check out how we push the...

44,898 views • 1 year ago •via X (Twitter)

10 Comments

Haven (Haiwen) Feng's profile picture
Haven (Haiwen) Feng1 year ago

Dynamic Control & Beyond: Unlike prior methods that rely on explicit simulation or only static state transitions, InterDyn built a dynamic control branch on top of Stable Video Diffusion. We then fine-tune it to generate complex interactions (e.g. hand-object manipulations) and realistic multi-object collisions without heavy simulation computations. #StableDiffusion #StabilityAI 🧵2/6

Haven (Haiwen) Feng's profile picture
Haven (Haiwen) Feng1 year ago

Intuitive Physics: At its core, InterDyn showcases the diffusion model’s “knowledge” of real-world physics and causal effects. By simply providing a moving object mask, the system implicitly models collisions, force propagation, and object dynamics—no 3D reconstruction or separate physics engine needed. 🧵3/6

Haven (Haiwen) Feng's profile picture
Haven (Haiwen) Feng1 year ago

Superior Performance: We evaluate InterDyn on both synthetic (CLEVRER) and real-world datasets (Something-Something-v2), achieving up to 37.5% improvement on LPIPS and 77% on FVD over prior work. Whether it’s multi-object collisions or hand-object manipulations, InterDyn produces diverse and physically plausible videos. 🧵4/6

Haven (Haiwen) Feng's profile picture
Haven (Haiwen) Feng1 year ago

Toward Interactive Video Generation: This new perspective merges intuitive physics with large-scale generative models, opening the door to controllable dynamics synthesis in complex scenes. We believe InterDyn lays the groundwork for future explorations in interactive video generation. Stay tuned for more! 🧵5/6

Haven (Haiwen) Feng's profile picture
Haven (Haiwen) Feng1 year ago

This work was co-lead by me and our talented Master intern @rick_akker25502 (He's applying for PhD now, hire him!) together with amazing advisors, @Michael_J_Black , @dimtzionas and @vfabrevaya . More details & demos coming soon! See you in Nashville! #CVPR2025 #AI #ResearchPapers 🧵6/6

Adam's profile picture
Adam1 year ago

Great work! I had a similar idea for hand-object interaction with video generation but with 3D conditioning

Robert Scoble's profile picture
Robert Scoble1 year ago

Wow great work!

Kangfu Mei's profile picture
Kangfu Mei1 year ago

Very nice and creative work!

Daniel Sungho Jung's profile picture
Daniel Sungho Jung1 year ago

Interesting work! Were there any challenges during the research?

Erika S's profile picture
Erika S1 year ago

InterDyn’s approach to controllable synthesis of interactive dynamics is fascinating. I’m excited to see how it advances intuitive physics with video generative models—truly pushing boundaries in AI and computer vision!

Related Videos