Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

How robust can model predictive control be if we can solve each trajectory optimization to global optimality? On the contact-rich push-T problem, we show that model-based global optimization is so robust that it never fails, even if the model is not even correct! We achieve global optimality via sparse...

28,660 Aufrufe • vor 1 Jahr •via X (Twitter)

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Model-Free Reinforcement Learning (MFRL) has been alluring, especially with supercharged compute with physics on GPU. However, the methods use 0-th order gradients, and are often not the best optimizers. Can we do better than PPO in continuous control for robotics? Turns out yes! 🥳 tl;dr: Faster, better RL than PPO in continuous control 💪 The answer lies in using more information from the simulation. We are juicing the simulation on GPU as it is, why not use it for gradients as well? This has been a driving question in a series of our works. We first studied this problem in ICLR 2022 paper on Short Horizon Actor Critic Naive gradient based methods are stuck in local minima and have exploding/vanishing gradients. SHAC solved this problem truncated rollouts and model based value estimation, where the model is Differentiable Sim. This boosted sample efficiency and wall-clock time immensely especially in high dimensional systems such as humanoids Yet, given enough compute PPO often caught up. Our follow up paper on on Adaptive Horizon Actor Critic at ICML 2024 discovers the cause and provides a fix. However, we find that even when given ground-truth dynamics, not all gradients are useful due to sample error. 1st-Order Model-Based Reinforcement Learning methods employing differentiable simulation provide gradients with reduced variance but are susceptible to bias in scenarios involving stiff dynamics, such as physical contact. We find that back-propagating through contact and long trajectories drastically reduces gradient accuracy. Using this insight, we propose AHAC to dynamically adapt its roll-out horizon to avoid differentiating through stiff contact. AHAC is a first-order model-based RL algorithm that learns high-dimensional tasks in minutes (wall clock) and outperforms PPO by 40%, even in the limit of data provided to PPO. This work is led by Ignat Georgiev alongside Krishnan Srinivasan, Jie Xu, Eric Heiden and ample assistance from warp team at NVIDIA Robotics (Miles Macklin)

Animesh Garg

52,279 Aufrufe • vor 2 Jahren