Zhiyang (Frank) Dou's banner

Zhiyang (Frank) Dou

@frankzydou • 2,063 subscribers

PhD student @MIT_CSAIL. MPhil @HKUniversity. Ex-visiting @Penn. Dynamics Modeling, Physical AI, Robotics, Sim, Geometry, Control, AIGC. 🦋https://t.co/YpydZBLKs1

Shorts

Here are more results from #RigidFormer: predicting physical dynamics with purely neural simulators — an attempt to learn physical dynamics in a scalable manner. 🤖 1) Controllable Articulated Body Simulation — More Results Additional Unitree G1 humanoid rollouts under controlled motion. Each sample uses a different initial state and control signal (direction and velocity). 🏺 2) Object Fragmentation Simulating the cracking and fragmentation process of objects. Thanks Žiga Kovačič for suggesting this experiment! 🎬 3) Combining Rigidformer with Diffusion-as-Shader for controllable video generation. Note: the meshes shown here are only for visualization — the network takes point clouds as input and predicts the updated state of each point.

Here are more results from #RigidFormer: predicting physical dynamics with purely neural simulators — an attempt to learn physical dynamics in a scalable manner. 🤖 1) Controllable Articulated Body Simulation — More Results Additional Unitree G1 humanoid rollouts under controlled motion. Each sample uses a different initial state and control signal (direction and velocity). 🏺 2) Object Fragmentation Simulating the cracking and fragmentation process of objects. Thanks Žiga Kovačič for suggesting this experiment! 🎬 3) Combining Rigidformer with Diffusion-as-Shader for controllable video generation. Note: the meshes shown here are only for visualization — the network takes point clouds as input and predicts the updated state of each point.

20,955 Aufrufe

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Introducing ✨RigidFormer: Learning Rigid Dynamics with Transformers - our attempt to scale learning-based physical dynamics with Transformers. RigidFormer learns rigid dynamics with Transformers. It is a mesh-free, object-centric Transformer for multi-object rigid-body contact dynamics from point clouds. Learning physics with purely neural simulators, without relying on traditional physics engines, is an important and widely studied problem. Prior SOTA methods often use graph neural networks for accuracy and generalization, but still struggle with efficient, high-fidelity simulation at scale. RigidFormer uses only point inputs, matches or outperforms mesh-based baselines on standard benchmarks, runs much faster, generalizes across point resolutions and datasets, and scales to 200+ objects. We also show a preliminary extension to command-conditioned articulated bodies by treating body parts as interacting object-level components. RigidFormer is mesh-free: it does not require mesh connectivity, SDFs, or vertex-level message passing, making it well-suited for point-cloud observations and scalable simulation. This architecture can also be adapted to learn soft-body dynamics by replacing the rigid-body module (differentiable Kabsch alignment). 🎬See our video for more details. Many thanks to my amazing collaborators: Minghao Guo Minghao Guo, Haixu Wu Haixu Wu 吴海旭, Doug Roble, Tuur Stuyck Tuur Stuyck, and Wojciech Matusik Wojciech Matusik. Project page: Paper:

Introducing ✨RigidFormer: Learning Rigid Dynamics with Transformers - our attempt to scale learning-based physical dynamics with Transformers. RigidFormer learns rigid dynamics with Transformers. It is a mesh-free, object-centric Transformer for multi-object rigid-body contact dynamics from point clouds. Learning physics with purely neural simulators, without relying on traditional physics engines, is an important and widely studied problem. Prior SOTA methods often use graph neural networks for accuracy and generalization, but still struggle with efficient, high-fidelity simulation at scale. RigidFormer uses only point inputs, matches or outperforms mesh-based baselines on standard benchmarks, runs much faster, generalizes across point resolutions and datasets, and scales to 200+ objects. We also show a preliminary extension to command-conditioned articulated bodies by treating body parts as interacting object-level components. RigidFormer is mesh-free: it does not require mesh connectivity, SDFs, or vertex-level message passing, making it well-suited for point-cloud observations and scalable simulation. This architecture can also be adapted to learn soft-body dynamics by replacing the rigid-body module (differentiable Kabsch alignment). 🎬See our video for more details. Many thanks to my amazing collaborators: Minghao Guo Minghao Guo, Haixu Wu Haixu Wu 吴海旭, Doug Roble, Tuur Stuyck Tuur Stuyck, and Wojciech Matusik Wojciech Matusik. Project page: Paper:

Zhiyang (Frank) Dou

572,102 Aufrufe • vor 2 Monaten

Excited to share that our work NeuralActuator: Neural Actuation Modeling for Robot Dynamics and External Force Perception has been accepted to #RSS2026! Your robot — even a low-cost one — can feel external forces without torque or tactile sensors. TL;DR: NeuralActuator is a neural actuator model that jointly predicts 1️⃣torque to capture the nonlinear and time-varying current–to–torque relationship of low-cost servos, 2️⃣external contact forces (and force detection gates) for sensorless force perception, 3️⃣and motor conditions that indicate each motor’s operating regime. Here is a fast-forward video clip ⬇️ We are also covering more robots like LeRobot-S101 and Franka Panda. More details coming soon.

Excited to share that our work NeuralActuator: Neural Actuation Modeling for Robot Dynamics and External Force Perception has been accepted to #RSS2026! Your robot — even a low-cost one — can feel external forces without torque or tactile sensors. TL;DR: NeuralActuator is a neural actuator model that jointly predicts 1️⃣torque to capture the nonlinear and time-varying current–to–torque relationship of low-cost servos, 2️⃣external contact forces (and force detection gates) for sensorless force perception, 3️⃣and motor conditions that indicate each motor’s operating regime. Here is a fast-forward video clip ⬇️ We are also covering more robots like LeRobot-S101 and Franka Panda. More details coming soon.

Zhiyang (Frank) Dou

49,994 Aufrufe • vor 2 Monaten

Excited to share that our paper 🌊🤺 “CFC: Simulating Character–Fluid Coupling using a Two-Level World Model” has been accepted to #SIGGRAPHASIA2025! In this work, we build a two-level world model (neural physics) for rigid-body–fluid interaction and use it to train physics-based character controllers efficiently. We study: (1) learning to model highly dynamic fluid environments, (2) representing character–fluid interaction via joint-level forces as an interface, and (3) enabling supervised policy learning on the learned world model—avoiding expensive fluid simulation in the training loop. Our talk is on Monday afternoon(Dec 15)—hope to see you there! Time: Monday, 15 December 2025 5:02pm - 5:13pm GMT+8 Location: Meeting Room S221, Level 2. #SIGGRAPHASIA #SIGGRAPH

Excited to share that our paper 🌊🤺 “CFC: Simulating Character–Fluid Coupling using a Two-Level World Model” has been accepted to #SIGGRAPHASIA2025! In this work, we build a two-level world model (neural physics) for rigid-body–fluid interaction and use it to train physics-based character controllers efficiently. We study: (1) learning to model highly dynamic fluid environments, (2) representing character–fluid interaction via joint-level forces as an interface, and (3) enabling supervised policy learning on the learned world model—avoiding expensive fluid simulation in the training loop. Our talk is on Monday afternoon(Dec 15)—hope to see you there! Time: Monday, 15 December 2025 5:02pm - 5:13pm GMT+8 Location: Meeting Room S221, Level 2. #SIGGRAPHASIA #SIGGRAPH

Zhiyang (Frank) Dou

20,200 Aufrufe • vor 7 Monaten

We present EgoReAct: Real-time 3D human reaction generation from streaming egocentric video. 🌟Reacting to streaming egocentric video is something humans do every day. We hope EgoReAct makes human motion more human-like. 🔎 What we found: existing ego-reaction data can be spatially inconsistent (e.g., moving reactions paired with fixed-camera videos), which breaks 3D grounding. 📷 What we built: HRD, a spatially aligned egocentric video–reaction dataset (3,500 pairs, 32 categories), plus a spatially aligned ViMo fix for fair evaluation. (Instead of collecting expensive ground-truth motion, we employ VDM to generate the egocentric videos.) 👁️⚡🏃 Our simple yet effective pipeline: motion tokenization for compact discrete codes + an autoregressive Transformer for online, strictly-causal generation. Metric depth and head dynamics further improve 3D spatial consistency. Project Page: ArXiv: #HumanMotion #EgocentricVision #3D #ARVR #Animation #AIGC #DeepLearning #GenerativeAI #Graphics #ComputerVision #Motion

We present EgoReAct: Real-time 3D human reaction generation from streaming egocentric video. 🌟Reacting to streaming egocentric video is something humans do every day. We hope EgoReAct makes human motion more human-like. 🔎 What we found: existing ego-reaction data can be spatially inconsistent (e.g., moving reactions paired with fixed-camera videos), which breaks 3D grounding. 📷 What we built: HRD, a spatially aligned egocentric video–reaction dataset (3,500 pairs, 32 categories), plus a spatially aligned ViMo fix for fair evaluation. (Instead of collecting expensive ground-truth motion, we employ VDM to generate the egocentric videos.) 👁️⚡🏃 Our simple yet effective pipeline: motion tokenization for compact discrete codes + an autoregressive Transformer for online, strictly-causal generation. Metric depth and head dynamics further improve 3D spatial consistency. Project Page: ArXiv: #HumanMotion #EgocentricVision #3D #ARVR #Animation #AIGC #DeepLearning #GenerativeAI #Graphics #ComputerVision #Motion

Zhiyang (Frank) Dou

11,336 Aufrufe • vor 6 Monaten

Excited to share our latest work on 🎧spatial audio-driven human motion generation. We aim to tackle a largely underexplored yet important problem of enabling virtual humans to move naturally in response to spatial audio—capturing not just what is heard, but also where the sound is coming from. To this end, we introduce the Spatial Audio-Driven Human Motion (SAM) dataset—the first comprehensive dataset featuring paired high-quality human motion and spatial audio recordings. For benchmarking, we develop a generative framework for human MOtion generation driven by SPAtial audio, termed MOSPA, which learns to synthesize realistic and diverse human motions conditioned on spatial audio input. We hope this research could provide a foundation for future research in spatial perception, virtual characters, and embodied AI. The dataset and model will be open-sourced soon. A big thank you to our intern, Shuyang Xu, for the wonderful collaboration! Congratulations, Shuyang! Project page: Paper: Video: #Animation #CG #CV #AIGC #DL #Deeplearning #Motion #Graphics #AI #GenerativeAI

Excited to share our latest work on 🎧spatial audio-driven human motion generation. We aim to tackle a largely underexplored yet important problem of enabling virtual humans to move naturally in response to spatial audio—capturing not just what is heard, but also where the sound is coming from. To this end, we introduce the Spatial Audio-Driven Human Motion (SAM) dataset—the first comprehensive dataset featuring paired high-quality human motion and spatial audio recordings. For benchmarking, we develop a generative framework for human MOtion generation driven by SPAtial audio, termed MOSPA, which learns to synthesize realistic and diverse human motions conditioned on spatial audio input. We hope this research could provide a foundation for future research in spatial perception, virtual characters, and embodied AI. The dataset and model will be open-sourced soon. A big thank you to our intern, Shuyang Xu, for the wonderful collaboration! Congratulations, Shuyang! Project page: Paper: Video: #Animation #CG #CV #AIGC #DL #Deeplearning #Motion #Graphics #AI #GenerativeAI

Zhiyang (Frank) Dou

14,610 Aufrufe • vor 1 Jahr

Check out 🌟Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry & Physics for Mesh-Free Simulation #CVPR2025, from Lingjie Liu’s lab at UPenn. Congrats to Chuhao Chen! Vid2Sim aims to achieve system identification by reconstructing geometry, appearance, and physical properties directly from video. It combines learned data priors with closed-loop optimization: a feed-forward predictor trained on physical prior, followed by fast refinement via Neural Jacobian and mesh-free simulation. The system delivers simulation-ready outputs in minutes, with strong generalization across objects and materials. 🏠Project page: #PhysicalAI #AIGC #CV #CG #simulation #graphics

Check out 🌟Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry & Physics for Mesh-Free Simulation #CVPR2025, from Lingjie Liu’s lab at UPenn. Congrats to Chuhao Chen! Vid2Sim aims to achieve system identification by reconstructing geometry, appearance, and physical properties directly from video. It combines learned data priors with closed-loop optimization: a feed-forward predictor trained on physical prior, followed by fast refinement via Neural Jacobian and mesh-free simulation. The system delivers simulation-ready outputs in minutes, with strong generalization across objects and materials. 🏠Project page: #PhysicalAI #AIGC #CV #CG #simulation #graphics

Zhiyang (Frank) Dou

12,393 Aufrufe • vor 1 Jahr

Got five papers accepted by #ECCV2024 European Conference on Computer Vision #ECCV2026 ! Huge thanks to all my collaborators! 😃 See you in Milan 🇮🇹 Summary of Selected Works (I made a fast-forward for them 😄) - [Shape Generation] Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models, ECCV 2024. - [Efficient Motion Generation] EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation, ECCV 2024. - [Controllable Motion Generation] TLControl: Trajectory and Language Control for Human Motion Synthesis, ECCV 2024. - [Avatar Generation] Disentangled Clothed Avatar Generation from Text Descriptions, ECCV 2024. Project Page: Surf-D: EMDM: TLControl: SOSMPL:

Zhiyang (Frank) Dou

18,187 Aufrufe • vor 2 Jahren

Keine weiteren Inhalte verfügbar