Loading video...

Video Failed to Load

Go Home

๐ŸŽ‰ Diffusion-style annealing + sampling-based MPC can surpass RL, and seamlessly adapt to task parameters, all ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด-๐—ณ๐—ฟ๐—ฒ๐—ฒ! We open sourced DIAL-MPC, the first training-free method for whole-body torque control using full-order dynamics ๐Ÿงต

172,397 views โ€ข 1 year ago โ€ขvia X (Twitter)

11 Comments

Haoru Xue's profile picture
Haoru Xue1 year ago

1/9 We surprisingly found that the formulation of sampling-based MPC and single diffusion step are ๐—ฒ๐—พ๐˜‚๐—ถ๐˜ƒ๐—ฎ๐—น๐—ฒ๐—ป๐˜. This motivates us to do multi-step diffusion in MPC. (theoretical proofs in paper) ๐—ฅ๐—ฒ๐—ฎ๐—น-๐˜๐—ถ๐—บ๐—ฒ sampling is possible! We did it thanks to recent advancement in massively parallel simulation.

Haoru Xue's profile picture
Haoru Xue1 year ago

2/9 There are two diffusion-style annealings in DIAL-MPC: Trajectory-level: within a single timestep, we iteratively re-sample with adaptive distribution, which is equivalent to denoising in diffusion.

Haoru Xue's profile picture
Haoru Xue1 year ago

3/9 And action-level: we leverage the receeding-horizon nature of MPC to re-use partially diffused actions in future steps.

Haoru Xue's profile picture
Haoru Xue1 year ago

4/9 Some DIAL-MPC tasks can be deployed directly to real. We achieve ๐—ฟ๐—ฒ๐—ฎ๐—น-๐˜๐—ถ๐—บ๐—ฒ ๐˜๐—ผ๐—ฟ๐—พ๐˜‚๐—ฒ ๐—ฐ๐—ผ๐—ป๐˜๐—ฟ๐—ผ๐—น on a quadruped doing versatile tasks. We can even give it a very heavy payload (6 kg ๐Ÿ‘‡), and it could easily adapt to it after modifying the mass in the model.

Haoru Xue's profile picture
Haoru Xue1 year ago

5/9 Why is DIAL-MPC better than conventional sampling-based MPC? In our paper and website, we includes a toy experiment with a very rough cost function landscape to explain the theory.

Haoru Xue's profile picture
Haoru Xue1 year ago

6/9 Although not an ๐ŸŽ to ๐Ÿ“ท comparison, why can DIAL-MPC be better than RL policy? 1. It is training-free: all reward designs can be instantly verified. 2. It is explicitly adaptive: facing OOD tasks like๐Ÿ“ท

Haoru Xue's profile picture
Haoru Xue1 year ago

7/9 We emphasize that DIAL-MPC is not meant to compete with RL - its future directions align very well with RL: 1. Add nominal value and policy functions to shorten DIAL-MPC horizon needs. 2. It can be an "RL reward visualizer" to accelerate RL reward engineering.

Haoru Xue's profile picture
Haoru Xue1 year ago

8/9 DIAL-MPC is parallel to popular works in using diffusion-style annealing for robot learning. A significant distinction is that DIAL-MPC is model-based whereas these are model-free: - Diffusion Policy Policy Optimization - Streaming Diffusion Policy

Haoru Xue's profile picture
Haoru Xue1 year ago

9/9 Thanks for reading through! Check out our project website! We also open-source the code on GitHub to run all DIAL-MPC demos, including the rendering pipeline. Follow the authors @HaoruXue @ChaoyiPan @zejiyi @guannanqu @GuanyaShi for more updates!

Chen Tessler's profile picture
Chen Tessler1 year ago

Looks amazing! Congrats! And cudos on the visuals too!

Haoru Xue's profile picture
Haoru Xue1 year ago

thank you! we also open-sourced our blender pipeline in the codebase, courtesy of the UMI on Legs authors

Related Videos

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPCโ€™s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the criticโ€™s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to รngel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Davide Scaramuzza

26,960 views โ€ข 4 months ago