Animesh Garg's banner
Animesh Garg's profile picture

Animesh Garg

@animesh_garg34,536 subscribers

Foundation Models for Generalizable Autonomy in Robotics. Reinforcement Learning. Faculty in AI Robotics @GeorgiaTech. Prev @nvidia @UCberkeley @stanford

Shorts

Model-Free Reinforcement Learning (MFRL) has been alluring, especially with supercharged compute with physics on GPU. However, the methods use 0-th order gradients, and are often not the best optimizers. Can we do better than PPO in continuous control for robotics? Turns out yes! 🥳 tl;dr: Faster, better RL than PPO in continuous control 💪 The answer lies in using more information from the simulation. We are juicing the simulation on GPU as it is, why not use it for gradients as well? This has been a driving question in a series of our works. We first studied this problem in ICLR 2022 paper on Short Horizon Actor Critic Naive gradient based methods are stuck in local minima and have exploding/vanishing gradients. SHAC solved this problem truncated rollouts and model based value estimation, where the model is Differentiable Sim. This boosted sample efficiency and wall-clock time immensely especially in high dimensional systems such as humanoids Yet, given enough compute PPO often caught up. Our follow up paper on on Adaptive Horizon Actor Critic at ICML 2024 discovers the cause and provides a fix. However, we find that even when given ground-truth dynamics, not all gradients are useful due to sample error. 1st-Order Model-Based Reinforcement Learning methods employing differentiable simulation provide gradients with reduced variance but are susceptible to bias in scenarios involving stiff dynamics, such as physical contact. We find that back-propagating through contact and long trajectories drastically reduces gradient accuracy. Using this insight, we propose AHAC to dynamically adapt its roll-out horizon to avoid differentiating through stiff contact. AHAC is a first-order model-based RL algorithm that learns high-dimensional tasks in minutes (wall clock) and outperforms PPO by 40%, even in the limit of data provided to PPO. This work is led by Ignat Georgiev alongside Krishnan Srinivasan, Jie Xu, Eric Heiden and ample assistance from warp team at NVIDIA Robotics (Miles Macklin)

Model-Free Reinforcement Learning (MFRL) has been alluring, especially with supercharged compute with physics on GPU. However, the methods use 0-th order gradients, and are often not the best optimizers. Can we do better than PPO in continuous control for robotics? Turns out yes! 🥳 tl;dr: Faster, better RL than PPO in continuous control 💪 The answer lies in using more information from the simulation. We are juicing the simulation on GPU as it is, why not use it for gradients as well? This has been a driving question in a series of our works. We first studied this problem in ICLR 2022 paper on Short Horizon Actor Critic Naive gradient based methods are stuck in local minima and have exploding/vanishing gradients. SHAC solved this problem truncated rollouts and model based value estimation, where the model is Differentiable Sim. This boosted sample efficiency and wall-clock time immensely especially in high dimensional systems such as humanoids Yet, given enough compute PPO often caught up. Our follow up paper on on Adaptive Horizon Actor Critic at ICML 2024 discovers the cause and provides a fix. However, we find that even when given ground-truth dynamics, not all gradients are useful due to sample error. 1st-Order Model-Based Reinforcement Learning methods employing differentiable simulation provide gradients with reduced variance but are susceptible to bias in scenarios involving stiff dynamics, such as physical contact. We find that back-propagating through contact and long trajectories drastically reduces gradient accuracy. Using this insight, we propose AHAC to dynamically adapt its roll-out horizon to avoid differentiating through stiff contact. AHAC is a first-order model-based RL algorithm that learns high-dimensional tasks in minutes (wall clock) and outperforms PPO by 40%, even in the limit of data provided to PPO. This work is led by Ignat Georgiev alongside Krishnan Srinivasan, Jie Xu, Eric Heiden and ample assistance from warp team at NVIDIA Robotics (Miles Macklin)

52,279 просмотров

How can robots reliably place objects in diverse real-world tasks? 🤖🔍 Placement is tough—objects vary in shape and placement modes (such as stacking, hanging, and insertion), making it a challenging problem. We introduce AnyPlace, a two-stage method trained purely on synthetic data to predict diverse placement poses of unseen objects for real-world tasks. Read on for more👇

How can robots reliably place objects in diverse real-world tasks? 🤖🔍 Placement is tough—objects vary in shape and placement modes (such as stacking, hanging, and insertion), making it a challenging problem. We introduce AnyPlace, a two-stage method trained purely on synthetic data to predict diverse placement poses of unseen objects for real-world tasks. Read on for more👇

24,662 просмотров

Videos

Больше нет контента для загрузки