Загрузка видео...

Не удалось загрузить видео

На главную

🎉 Diffusion-style annealing + sampling-based MPC can surpass RL, and seamlessly adapt to task parameters, all 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴-𝗳𝗿𝗲𝗲! We open sourced DIAL-MPC, the first training-free method for whole-body torque control using full-order dynamics 🧵

172,397 просмотров • 1 год назад •via X (Twitter)

Комментарии: 11

Фото профиля Haoru Xue
Haoru Xue1 год назад

1/9 We surprisingly found that the formulation of sampling-based MPC and single diffusion step are 𝗲𝗾𝘂𝗶𝘃𝗮𝗹𝗲𝗻𝘁. This motivates us to do multi-step diffusion in MPC. (theoretical proofs in paper) 𝗥𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 sampling is possible! We did it thanks to recent advancement in massively parallel simulation.

Фото профиля Haoru Xue
Haoru Xue1 год назад

2/9 There are two diffusion-style annealings in DIAL-MPC: Trajectory-level: within a single timestep, we iteratively re-sample with adaptive distribution, which is equivalent to denoising in diffusion.

Фото профиля Haoru Xue
Haoru Xue1 год назад

3/9 And action-level: we leverage the receeding-horizon nature of MPC to re-use partially diffused actions in future steps.

Фото профиля Haoru Xue
Haoru Xue1 год назад

4/9 Some DIAL-MPC tasks can be deployed directly to real. We achieve 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 𝘁𝗼𝗿𝗾𝘂𝗲 𝗰𝗼𝗻𝘁𝗿𝗼𝗹 on a quadruped doing versatile tasks. We can even give it a very heavy payload (6 kg 👇), and it could easily adapt to it after modifying the mass in the model.

Фото профиля Haoru Xue
Haoru Xue1 год назад

5/9 Why is DIAL-MPC better than conventional sampling-based MPC? In our paper and website, we includes a toy experiment with a very rough cost function landscape to explain the theory.

Фото профиля Haoru Xue
Haoru Xue1 год назад

6/9 Although not an 🍎 to 📷 comparison, why can DIAL-MPC be better than RL policy? 1. It is training-free: all reward designs can be instantly verified. 2. It is explicitly adaptive: facing OOD tasks like📷

Фото профиля Haoru Xue
Haoru Xue1 год назад

7/9 We emphasize that DIAL-MPC is not meant to compete with RL - its future directions align very well with RL: 1. Add nominal value and policy functions to shorten DIAL-MPC horizon needs. 2. It can be an "RL reward visualizer" to accelerate RL reward engineering.

Фото профиля Haoru Xue
Haoru Xue1 год назад

8/9 DIAL-MPC is parallel to popular works in using diffusion-style annealing for robot learning. A significant distinction is that DIAL-MPC is model-based whereas these are model-free: - Diffusion Policy Policy Optimization - Streaming Diffusion Policy

Фото профиля Haoru Xue
Haoru Xue1 год назад

9/9 Thanks for reading through! Check out our project website! We also open-source the code on GitHub to run all DIAL-MPC demos, including the rendering pipeline. Follow the authors @HaoruXue @ChaoyiPan @zejiyi @guannanqu @GuanyaShi for more updates!

Фото профиля Chen Tessler
Chen Tessler1 год назад

Looks amazing! Congrats! And cudos on the visuals too!

Фото профиля Haoru Xue
Haoru Xue1 год назад

thank you! we also open-sourced our blender pipeline in the codebase, courtesy of the UMI on Legs authors

Похожие видео

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Davide Scaramuzza

26,960 просмотров • 4 месяцев назад