
Zhiyang (Frank) Dou
@frankzydou • 2,006 subscribers
PhD student @MIT_CSAIL. MPhil @HKUniversity. Ex-visiting @Penn. Dynamics Modeling, Physical AI, Robotics, Sim, Geometry, Control, AIGC. 🦋https://t.co/YpydZBLKs1
Videos

Introducing ✨RigidFormer: Learning Rigid Dynamics with Transformers - our attempt to scale learning-based physical dynamics with Transformers. RigidFormer learns rigid dynamics with Transformers. It is a mesh-free, object-centric Transformer for multi-object rigid-body contact dynamics from point clouds. Learning physics with purely neural simulators, without relying on traditional physics engines, is an important and widely studied problem. Prior SOTA methods often use graph neural networks for accuracy and generalization, but still struggle with efficient, high-fidelity simulation at scale. RigidFormer uses only point inputs, matches or outperforms mesh-based baselines on standard benchmarks, runs much faster, generalizes across point resolutions and datasets, and scales to 200+ objects. We also show a preliminary extension to command-conditioned articulated bodies by treating body parts as interacting object-level components. RigidFormer is mesh-free: it does not require mesh connectivity, SDFs, or vertex-level message passing, making it well-suited for point-cloud observations and scalable simulation. This architecture can also be adapted to learn soft-body dynamics by replacing the rigid-body module (differentiable Kabsch alignment). 🎬See our video for more details. Many thanks to my amazing collaborators: Minghao Guo Minghao Guo, Haixu Wu Haixu Wu 吴海旭, Doug Roble, Tuur Stuyck Tuur Stuyck, and Wojciech Matusik Wojciech Matusik. Project page: Paper:
Zhiyang (Frank) Dou569,421 Aufrufe • vor 25 Tagen

Excited to share that our work NeuralActuator: Neural Actuation Modeling for Robot Dynamics and External Force Perception has been accepted to #RSS2026! Your robot — even a low-cost one — can feel external forces without torque or tactile sensors. TL;DR: NeuralActuator is a neural actuator model that jointly predicts 1️⃣torque to capture the nonlinear and time-varying current–to–torque relationship of low-cost servos, 2️⃣external contact forces (and force detection gates) for sensorless force perception, 3️⃣and motor conditions that indicate each motor’s operating regime. Here is a fast-forward video clip ⬇️ We are also covering more robots like LeRobot-S101 and Franka Panda. More details coming soon.
Zhiyang (Frank) Dou40,021 Aufrufe • vor 1 Monat

Excited to share that our paper 🌊🤺 “CFC: Simulating Character–Fluid Coupling using a Two-Level World Model” has been accepted to #SIGGRAPHASIA2025! In this work, we build a two-level world model (neural physics) for rigid-body–fluid interaction and use it to train physics-based character controllers efficiently. We study: (1) learning to model highly dynamic fluid environments, (2) representing character–fluid interaction via joint-level forces as an interface, and (3) enabling supervised policy learning on the learned world model—avoiding expensive fluid simulation in the training loop. Our talk is on Monday afternoon(Dec 15)—hope to see you there! Time: Monday, 15 December 2025 5:02pm - 5:13pm GMT+8 Location: Meeting Room S221, Level 2. #SIGGRAPHASIA #SIGGRAPH
Zhiyang (Frank) Dou20,200 Aufrufe • vor 5 Monaten

We present EgoReAct: Real-time 3D human reaction generation from streaming egocentric video. 🌟Reacting to streaming egocentric video is something humans do every day. We hope EgoReAct makes human motion more human-like. 🔎 What we found: existing ego-reaction data can be spatially inconsistent (e.g., moving reactions paired with fixed-camera videos), which breaks 3D grounding. 📷 What we built: HRD, a spatially aligned egocentric video–reaction dataset (3,500 pairs, 32 categories), plus a spatially aligned ViMo fix for fair evaluation. (Instead of collecting expensive ground-truth motion, we employ VDM to generate the egocentric videos.) 👁️⚡🏃 Our simple yet effective pipeline: motion tokenization for compact discrete codes + an autoregressive Transformer for online, strictly-causal generation. Metric depth and head dynamics further improve 3D spatial consistency. Project Page: ArXiv: #HumanMotion #EgocentricVision #3D #ARVR #Animation #AIGC #DeepLearning #GenerativeAI #Graphics #ComputerVision #Motion
Zhiyang (Frank) Dou11,259 Aufrufe • vor 5 Monaten

Check out 🌟Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry & Physics for Mesh-Free Simulation #CVPR2025, from Lingjie Liu’s lab at UPenn. Congrats to Chuhao Chen! Vid2Sim aims to achieve system identification by reconstructing geometry, appearance, and physical properties directly from video. It combines learned data priors with closed-loop optimization: a feed-forward predictor trained on physical prior, followed by fast refinement via Neural Jacobian and mesh-free simulation. The system delivers simulation-ready outputs in minutes, with strong generalization across objects and materials. 🏠Project page: #PhysicalAI #AIGC #CV #CG #simulation #graphics
Zhiyang (Frank) Dou12,393 Aufrufe • vor 11 Monaten
Keine weiteren Inhalte verfügbar