Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

๐ŸŽ‰ Diffusion-style annealing + sampling-based MPC can surpass RL, and seamlessly adapt to task parameters, all ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด-๐—ณ๐—ฟ๐—ฒ๐—ฒ! We open sourced DIAL-MPC, the first training-free method for whole-body torque control using full-order dynamics ๐Ÿงต

172,457 Aufrufe โ€ข vor 1 Jahr โ€ขvia X (Twitter)

11 Kommentare

Profilbild von Haoru Xue
Haoru Xuevor 1 Jahr

1/9 We surprisingly found that the formulation of sampling-based MPC and single diffusion step are ๐—ฒ๐—พ๐˜‚๐—ถ๐˜ƒ๐—ฎ๐—น๐—ฒ๐—ป๐˜. This motivates us to do multi-step diffusion in MPC. (theoretical proofs in paper) ๐—ฅ๐—ฒ๐—ฎ๐—น-๐˜๐—ถ๐—บ๐—ฒ sampling is possible! We did it thanks to recent advancement in massively parallel simulation.

Profilbild von Haoru Xue
Haoru Xuevor 1 Jahr

2/9 There are two diffusion-style annealings in DIAL-MPC: Trajectory-level: within a single timestep, we iteratively re-sample with adaptive distribution, which is equivalent to denoising in diffusion.

Profilbild von Haoru Xue
Haoru Xuevor 1 Jahr

3/9 And action-level: we leverage the receeding-horizon nature of MPC to re-use partially diffused actions in future steps.

Profilbild von Haoru Xue
Haoru Xuevor 1 Jahr

4/9 Some DIAL-MPC tasks can be deployed directly to real. We achieve ๐—ฟ๐—ฒ๐—ฎ๐—น-๐˜๐—ถ๐—บ๐—ฒ ๐˜๐—ผ๐—ฟ๐—พ๐˜‚๐—ฒ ๐—ฐ๐—ผ๐—ป๐˜๐—ฟ๐—ผ๐—น on a quadruped doing versatile tasks. We can even give it a very heavy payload (6 kg ๐Ÿ‘‡), and it could easily adapt to it after modifying the mass in the model.

Profilbild von Haoru Xue
Haoru Xuevor 1 Jahr

5/9 Why is DIAL-MPC better than conventional sampling-based MPC? In our paper and website, we includes a toy experiment with a very rough cost function landscape to explain the theory.

Profilbild von Haoru Xue
Haoru Xuevor 1 Jahr

6/9 Although not an ๐ŸŽ to ๐Ÿ“ท comparison, why can DIAL-MPC be better than RL policy? 1. It is training-free: all reward designs can be instantly verified. 2. It is explicitly adaptive: facing OOD tasks like๐Ÿ“ท

Profilbild von Haoru Xue
Haoru Xuevor 1 Jahr

7/9 We emphasize that DIAL-MPC is not meant to compete with RL - its future directions align very well with RL: 1. Add nominal value and policy functions to shorten DIAL-MPC horizon needs. 2. It can be an "RL reward visualizer" to accelerate RL reward engineering.

Profilbild von Haoru Xue
Haoru Xuevor 1 Jahr

8/9 DIAL-MPC is parallel to popular works in using diffusion-style annealing for robot learning. A significant distinction is that DIAL-MPC is model-based whereas these are model-free: - Diffusion Policy Policy Optimization - Streaming Diffusion Policy

Profilbild von Haoru Xue
Haoru Xuevor 1 Jahr

9/9 Thanks for reading through! Check out our project website! We also open-source the code on GitHub to run all DIAL-MPC demos, including the rendering pipeline. Follow the authors @HaoruXue @ChaoyiPan @zejiyi @guannanqu @GuanyaShi for more updates!

Profilbild von Chen Tessler
Chen Tesslervor 1 Jahr

Looks amazing! Congrats! And cudos on the visuals too!

Profilbild von Haoru Xue
Haoru Xuevor 1 Jahr

thank you! we also open-sourced our blender pipeline in the codebase, courtesy of the UMI on Legs authors

ร„hnliche Videos

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPCโ€™s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the criticโ€™s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to รngel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Davide Scaramuzza

26,960 Aufrufe โ€ข vor 5 Monaten