Loading video...
Video Failed to Load
Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵
130,935 views • 1 year ago •via X (Twitter)
11 Comments

[1/N] FastTD3 is fast -- it solves humanoid tasks faster than PPO and other off-policy RL algorithms. But it’s simple: just TD3 agent + massively parallel simulation, a giant batch size, and a distributional critic (like PQL (@softraeh), but even simpler)

[2/N] FastTD3 isn’t just fast -- it’s capable. It solves challenging HumanoidBench tasks with dexterous hands in under 3 hours, and solves tasks from MuJoCo Playground and IsaacLab with ease.

[3/N] We even did zero-shot sim-to-real transfer to the Booster T1 humanoid -- directly from the amazingly easy-to-use MuJoCo Playground! (@kevin_zakka, @boosterobotics)

[4/N] Why does this work? (i) Parallel simulation offsets TD3’s exploration weakness via diverse data (ii) Large batches + distributional critic stabilize value learning Then the deterministic policy gradient just does its job: exploit.

[5/N] Our goal is not to claim novelty -- we deliberately built a simple algorithm that just works without bells and whistles. Excited to see what the community builds on top of it! Check below as well:

[6/N] This work wouldn’t have been possible without my collaborators @carlo_sferrazza @HaoranGeng2 @mic_nau @zhaohengyin @pabbeel Page: Arxiv:

Introducing Vehrbal, the AI that converts audio into SOAP notes! Say goodbye to wasted time and hello to effortless note-taking. Experience the power of fast, simple, and efficient with Vehrbal today.

I liked this work, congrats on the release!

Thank you Seohong! Let's see which of these observations will also hold in offline RL

do you intend to PR this to any larger RL libs / would you support others efforts to do so if not? (with appropriate citation of course)

love the robotic moonwalk hahah
