Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

🎉 Diffusion-style annealing + sampling-based MPC can surpass RL, and seamlessly adapt to task parameters, all 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴-𝗳𝗿𝗲𝗲! We open sourced DIAL-MPC, the first training-free method for whole-body torque control using full-order dynamics 🧵

Haoru Xue

2,924 subscribers

172,457 Aufrufe • vor 1 Jahr •via X (Twitter)

Gaming Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

11 Kommentare

Profilbild von Haoru Xue

Haoru Xuevor 1 Jahr

1/9 We surprisingly found that the formulation of sampling-based MPC and single diffusion step are 𝗲𝗾𝘂𝗶𝘃𝗮𝗹𝗲𝗻𝘁. This motivates us to do multi-step diffusion in MPC. (theoretical proofs in paper) 𝗥𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 sampling is possible! We did it thanks to recent advancement in massively parallel simulation.

Profilbild von Haoru Xue

Haoru Xuevor 1 Jahr

2/9 There are two diffusion-style annealings in DIAL-MPC: Trajectory-level: within a single timestep, we iteratively re-sample with adaptive distribution, which is equivalent to denoising in diffusion.

Profilbild von Haoru Xue

Haoru Xuevor 1 Jahr

3/9 And action-level: we leverage the receeding-horizon nature of MPC to re-use partially diffused actions in future steps.

Profilbild von Haoru Xue

Haoru Xuevor 1 Jahr

4/9 Some DIAL-MPC tasks can be deployed directly to real. We achieve 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 𝘁𝗼𝗿𝗾𝘂𝗲 𝗰𝗼𝗻𝘁𝗿𝗼𝗹 on a quadruped doing versatile tasks. We can even give it a very heavy payload (6 kg 👇), and it could easily adapt to it after modifying the mass in the model.

Profilbild von Haoru Xue

Haoru Xuevor 1 Jahr

5/9 Why is DIAL-MPC better than conventional sampling-based MPC? In our paper and website, we includes a toy experiment with a very rough cost function landscape to explain the theory.

Profilbild von Haoru Xue

Haoru Xuevor 1 Jahr

6/9 Although not an 🍎 to 📷 comparison, why can DIAL-MPC be better than RL policy? 1. It is training-free: all reward designs can be instantly verified. 2. It is explicitly adaptive: facing OOD tasks like📷

Profilbild von Haoru Xue

Haoru Xuevor 1 Jahr

7/9 We emphasize that DIAL-MPC is not meant to compete with RL - its future directions align very well with RL: 1. Add nominal value and policy functions to shorten DIAL-MPC horizon needs. 2. It can be an "RL reward visualizer" to accelerate RL reward engineering.

Profilbild von Haoru Xue

Haoru Xuevor 1 Jahr

8/9 DIAL-MPC is parallel to popular works in using diffusion-style annealing for robot learning. A significant distinction is that DIAL-MPC is model-based whereas these are model-free: - Diffusion Policy Policy Optimization - Streaming Diffusion Policy

Profilbild von Haoru Xue

Haoru Xuevor 1 Jahr

9/9 Thanks for reading through! Check out our project website! We also open-source the code on GitHub to run all DIAL-MPC demos, including the rendering pipeline. Follow the authors @HaoruXue @ChaoyiPan @zejiyi @guannanqu @GuanyaShi for more updates!

Profilbild von Chen Tessler

Chen Tesslervor 1 Jahr

Looks amazing! Congrats! And cudos on the visuals too!

Profilbild von Haoru Xue

Haoru Xuevor 1 Jahr

thank you! we also open-sourced our blender pipeline in the codebase, courtesy of the UMI on Legs authors

Ähnliche Videos

Thanks AK! Finally, robot can do continuous, agile, autonomous, adaptive jumping over stair and stepping stone Key idea: combine the pros of model-free RL and model-based control. RL (for CoM refs) + QP (for GRF) + WBC (for torque) Open-sourced:

Thanks AK! Finally, robot can do continuous, agile, autonomous, adaptive jumping over stair and stepping stone Key idea: combine the pros of model-free RL and model-based control. RL (for CoM refs) + QP (for GRF) + WBC (for torque) Open-sourced:

Guanya Shi

32,155 Aufrufe • vor 1 Jahr

Check out our #ICRA2024 paper "Actor-Critic Model Predictive Control." Model-free #reinforcementlearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) benefits from robustness and online replanning capabilities. We combine both approaches by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an Actor-Critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in simulation and the real world with a quadcopter across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out-of-distribution behavior. Paper: Full Video with more details: Kudos to Ángel Romero, Yunlong Song IEEE ICRA University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Check out our #ICRA2024 paper "Actor-Critic Model Predictive Control." Model-free #reinforcementlearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) benefits from robustness and online replanning capabilities. We combine both approaches by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an Actor-Critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in simulation and the real world with a quadcopter across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out-of-distribution behavior. Paper: Full Video with more details: Kudos to Ángel Romero, Yunlong Song IEEE ICRA University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Davide Scaramuzza

34,874 Aufrufe • vor 2 Jahren

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Davide Scaramuzza

26,960 Aufrufe • vor 5 Monaten

If you have a policy that uses diffusion/flow (e.g. diffusion VLA), you can run RL where the actor chooses the noise, which is then denoised by the policy to produce an action. This method, which we call diffusion steering (DSRL), leads to a remarkably efficient RL method! 🧵👇

If you have a policy that uses diffusion/flow (e.g. diffusion VLA), you can run RL where the actor chooses the noise, which is then denoised by the policy to produce an action. This method, which we call diffusion steering (DSRL), leads to a remarkably efficient RL method! 🧵👇

Sergey Levine

152,824 Aufrufe • vor 11 Monaten

It’s not scrolling... it’s crate digging. 😂 Plug in via USB-C and sample straight from your smart device to your MPC Sample. Want more tips like this? Tune into the full MPC Academy video series: #MPC #MPCSample #sampling

It’s not scrolling... it’s crate digging. 😂 Plug in via USB-C and sample straight from your smart device to your MPC Sample. Want more tips like this? Tune into the full MPC Academy video series: #MPC #MPCSample #sampling

Akai Professional

27,317 Aufrufe • vor 2 Monaten

Remarkably lifelike motion and fluidity. BeyondMimic is a framework for training humanoid whole-body control from large mocap datasets. First, an open-source motion-tracking pipeline to reproduce diverse, highly dynamic human skills on real hardware, then distilling them into a guided state-action diffusion model for zero-shot, task-specific control. Project page:

Remarkably lifelike motion and fluidity. BeyondMimic is a framework for training humanoid whole-body control from large mocap datasets. First, an open-source motion-tracking pipeline to reproduce diverse, highly dynamic human skills on real hardware, then distilling them into a guided state-action diffusion model for zero-shot, task-specific control. Project page:

The Humanoid Hub

58,081 Aufrufe • vor 10 Monaten

We present PeRFlow (Piecewise Rectified Flow): Fast sampling. Efficient training. Transferrable to various SD-based models/methods. Fully Open-Sourced. ➡️Website: ➡️Github: Screenshot for fast text-to-multiview with PeRFlow:

We present PeRFlow (Piecewise Rectified Flow): Fast sampling. Efficient training. Transferrable to various SD-based models/methods. Fully Open-Sourced. ➡️Website: ➡️Github: Screenshot for fast text-to-multiview with PeRFlow:

Xingchao Liu

28,800 Aufrufe • vor 2 Jahren

House music... straight out of the box. 🏠 Learn everything you need to get started on MPC Sample, including how to build a full track using only factory sounds in the 30-part MPC Sample Academy Video series - live now at #MPC #MPCSample

House music... straight out of the box. 🏠 Learn everything you need to get started on MPC Sample, including how to build a full track using only factory sounds in the 30-part MPC Sample Academy Video series - live now at #MPC #MPCSample

Akai Professional

14,919 Aufrufe • vor 2 Monaten

See Spot perform dynamic whole-body manipulation. Using a combination of reinforcement learning (RL) and sampling-based control, the robot is able to autonomously drag, roll, and stack tires weighing 15 kg (33 lb), well above its maximum arm lift capacity. Learn more about coordinating locomotion and manipulation processes:

See Spot perform dynamic whole-body manipulation. Using a combination of reinforcement learning (RL) and sampling-based control, the robot is able to autonomously drag, roll, and stack tires weighing 15 kg (33 lb), well above its maximum arm lift capacity. Learn more about coordinating locomotion and manipulation processes:

RAI Institute

87,430 Aufrufe • vor 8 Monaten

Harness the power and elevate your sounds with Akai Professional's MPC seamlessly integrated with Native Instruments' (@NI_news) innovative tools—unlock the future of music creation. Learn more at #AkaiPro #NativeInstruments #MPC #NAMMShow

Harness the power and elevate your sounds with Akai Professional's MPC seamlessly integrated with Native Instruments' (@NI_news) innovative tools—unlock the future of music creation. Learn more at #AkaiPro #NativeInstruments #MPC #NAMMShow

Akai Professional

35,976 Aufrufe • vor 1 Jahr

This is the First video in MPC Web3 Wallets Series. We explained how traditional and MPC wallet works. What is Entropy and why it is important. Hope you like the video.

This is the First video in MPC Web3 Wallets Series. We explained how traditional and MPC wallet works. What is Entropy and why it is important. Hope you like the video.

Crypto Point Hindi 🇮🇳💎💫

16,261 Aufrufe • vor 2 Jahren

MPC Sample is here! 💥 For nearly 40 years, MPCs have shaped the sound and culture of modern music production. The new MPC Sample carries this legacy forward by making the art of sampling more accessible and approachable than ever before. Learn more at

MPC Sample is here! 💥 For nearly 40 years, MPCs have shaped the sound and culture of modern music production. The new MPC Sample carries this legacy forward by making the art of sampling more accessible and approachable than ever before. Learn more at

Akai Professional

45,757 Aufrufe • vor 2 Monaten

A humanoid robot must recover from any fall or stabilize itself using arms and torso while tripping. MPC and sim-to-real methods often struggle with this. A new study by Tsinghua University researchers tackles uncertain contact scenarios using a rigid-body simulator and RL.

A humanoid robot must recover from any fall or stabilize itself using arms and torso while tripping. MPC and sim-to-real methods often struggle with this. A new study by Tsinghua University researchers tackles uncertain contact scenarios using a rigid-body simulator and RL.

The Humanoid Hub

129,671 Aufrufe • vor 1 Jahr

Scump watch party reacts to Shotzzy breaking GA’s and using the MPC: Scump: “Wait is that an MPC.. GA’s are out of the window.” BoZe: “Oh sh*t he’s breaking the GA.” Methodz: “We don’t give a f**k.” 🗣️😭

Scump watch party reacts to Shotzzy breaking GA’s and using the MPC: Scump: “Wait is that an MPC.. GA’s are out of the window.” BoZe: “Oh sh*t he’s breaking the GA.” Methodz: “We don’t give a f**k.” 🗣️😭

CoD Clipped

91,526 Aufrufe • vor 5 Monaten

So you’ve trained your favorite diffusion/flow based policy, but it’s just not good enough 0-shot. Worry not, in our new work DSRL - we show how to *steer* pre-trained diffusion policies with off-policy RL, improving behavior efficiently enough for direct training in the real world! DSRL retains nice exploration from the base policy, but allows for quick improvement beyond this base policy with RL. The method is frustratingly simple, and super easy to throw on top of your favorite pretrained policy (VLA/diffusion policy, etc). Let’s think about how it works, 🧵 (1/10)

So you’ve trained your favorite diffusion/flow based policy, but it’s just not good enough 0-shot. Worry not, in our new work DSRL - we show how to steer pre-trained diffusion policies with off-policy RL, improving behavior efficiently enough for direct training in the real world! DSRL retains nice exploration from the base policy, but allows for quick improvement beyond this base policy with RL. The method is frustratingly simple, and super easy to throw on top of your favorite pretrained policy (VLA/diffusion policy, etc). Let’s think about how it works, 🧵 (1/10)

Abhishek Gupta

19,035 Aufrufe • vor 1 Jahr

Discover the power of $MPC! The native token from #PartisiaBlockchain can be used for governance, security, collateralization, & insurance. Stake to secure the network, get incentives, & safeguard private data and cross-chain bridges. Explore ➪ 🚀 #MPC

Discover the power of $MPC! The native token from #PartisiaBlockchain can be used for governance, security, collateralization, & insurance. Stake to secure the network, get incentives, & safeguard private data and cross-chain bridges. Explore ➪ 🚀 #MPC

Partisia Blockchain

72,714 Aufrufe • vor 2 Jahren

Presto! Distilling Steps and Layers for Accelerating Music Generation Despite advances in diffusion-based text-to-music (TTM) methods, efficient, high-quality generation remains a challenge. We introduce Presto!, an approach to inference acceleration for score-based diffusion transformers via reducing both sampling steps and cost per step. To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion models, the first GAN-based distillation method for TTM. To reduce the cost per step, we develop a simple, but powerful improvement to a recent layer distillation method that improves learning via better preserving hidden state variance. Finally, we combine our step and layer distillation methods together for a dual-faceted approach. We evaluate our step and layer distillation methods independently and show each yield best-in-class performance. Our combined distillation method can generate high-quality outputs with improved diversity, accelerating our base model by 10-18x (230/435ms latency for 32 second mono/stereo 44.1kHz, 15x faster than comparable SOTA) -- the fastest high-quality TTM to our knowledge.

Presto! Distilling Steps and Layers for Accelerating Music Generation Despite advances in diffusion-based text-to-music (TTM) methods, efficient, high-quality generation remains a challenge. We introduce Presto!, an approach to inference acceleration for score-based diffusion transformers via reducing both sampling steps and cost per step. To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion models, the first GAN-based distillation method for TTM. To reduce the cost per step, we develop a simple, but powerful improvement to a recent layer distillation method that improves learning via better preserving hidden state variance. Finally, we combine our step and layer distillation methods together for a dual-faceted approach. We evaluate our step and layer distillation methods independently and show each yield best-in-class performance. Our combined distillation method can generate high-quality outputs with improved diversity, accelerating our base model by 10-18x (230/435ms latency for 32 second mono/stereo 44.1kHz, 15x faster than comparable SOTA) -- the fastest high-quality TTM to our knowledge.

AK

30,430 Aufrufe • vor 1 Jahr

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

Physical Intelligence

430,406 Aufrufe • vor 3 Monaten

Karma has his take on Shotzzy using the MPC -25 👀 "It's basically the MSMC, it feels more like a sub, Ant was definitely forcing the gun" "He said 'on Den either side with the Dravec I can rip them, with the MPC no chance', we scrimmed all week with it, the gun isn't op" 🗣️

Karma has his take on Shotzzy using the MPC -25 👀 "It's basically the MSMC, it feels more like a sub, Ant was definitely forcing the gun" "He said 'on Den either side with the Dravec I can rip them, with the MPC no chance', we scrimmed all week with it, the gun isn't op" 🗣️

CDL Hater Central

126,244 Aufrufe • vor 5 Monaten