Loading video...
Video Failed to Load
๐ Diffusion-style annealing + sampling-based MPC can surpass RL, and seamlessly adapt to task parameters, all ๐๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด-๐ณ๐ฟ๐ฒ๐ฒ! We open sourced DIAL-MPC, the first training-free method for whole-body torque control using full-order dynamics ๐งต
172,397 views โข 1 year ago โขvia X (Twitter)
11 Comments

1/9 We surprisingly found that the formulation of sampling-based MPC and single diffusion step are ๐ฒ๐พ๐๐ถ๐๐ฎ๐น๐ฒ๐ป๐. This motivates us to do multi-step diffusion in MPC. (theoretical proofs in paper) ๐ฅ๐ฒ๐ฎ๐น-๐๐ถ๐บ๐ฒ sampling is possible! We did it thanks to recent advancement in massively parallel simulation.

2/9 There are two diffusion-style annealings in DIAL-MPC: Trajectory-level: within a single timestep, we iteratively re-sample with adaptive distribution, which is equivalent to denoising in diffusion.

3/9 And action-level: we leverage the receeding-horizon nature of MPC to re-use partially diffused actions in future steps.

4/9 Some DIAL-MPC tasks can be deployed directly to real. We achieve ๐ฟ๐ฒ๐ฎ๐น-๐๐ถ๐บ๐ฒ ๐๐ผ๐ฟ๐พ๐๐ฒ ๐ฐ๐ผ๐ป๐๐ฟ๐ผ๐น on a quadruped doing versatile tasks. We can even give it a very heavy payload (6 kg ๐), and it could easily adapt to it after modifying the mass in the model.

5/9 Why is DIAL-MPC better than conventional sampling-based MPC? In our paper and website, we includes a toy experiment with a very rough cost function landscape to explain the theory.

6/9 Although not an ๐ to ๐ท comparison, why can DIAL-MPC be better than RL policy? 1. It is training-free: all reward designs can be instantly verified. 2. It is explicitly adaptive: facing OOD tasks like๐ท

7/9 We emphasize that DIAL-MPC is not meant to compete with RL - its future directions align very well with RL: 1. Add nominal value and policy functions to shorten DIAL-MPC horizon needs. 2. It can be an "RL reward visualizer" to accelerate RL reward engineering.

8/9 DIAL-MPC is parallel to popular works in using diffusion-style annealing for robot learning. A significant distinction is that DIAL-MPC is model-based whereas these are model-free: - Diffusion Policy Policy Optimization - Streaming Diffusion Policy

9/9 Thanks for reading through! Check out our project website! We also open-source the code on GitHub to run all DIAL-MPC demos, including the rendering pipeline. Follow the authors @HaoruXue @ChaoyiPan @zejiyi @guannanqu @GuanyaShi for more updates!

Looks amazing! Congrats! And cudos on the visuals too!

thank you! we also open-sourced our blender pipeline in the codebase, courtesy of the UMI on Legs authors


