Loading video...

Video Failed to Load

Go Home

Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵

130,935 views • 1 year ago •via X (Twitter)

11 Comments

Younggyo Seo's profile picture
Younggyo Seo1 year ago

[1/N] FastTD3 is fast -- it solves humanoid tasks faster than PPO and other off-policy RL algorithms. But it’s simple: just TD3 agent + massively parallel simulation, a giant batch size, and a distributional critic (like PQL (@softraeh), but even simpler)

Younggyo Seo's profile picture
Younggyo Seo1 year ago

[2/N] FastTD3 isn’t just fast -- it’s capable. It solves challenging HumanoidBench tasks with dexterous hands in under 3 hours, and solves tasks from MuJoCo Playground and IsaacLab with ease.

Younggyo Seo's profile picture
Younggyo Seo1 year ago

[3/N] We even did zero-shot sim-to-real transfer to the Booster T1 humanoid -- directly from the amazingly easy-to-use MuJoCo Playground! (@kevin_zakka, @boosterobotics)

Younggyo Seo's profile picture
Younggyo Seo1 year ago

[4/N] Why does this work? (i) Parallel simulation offsets TD3’s exploration weakness via diverse data (ii) Large batches + distributional critic stabilize value learning Then the deterministic policy gradient just does its job: exploit.

Younggyo Seo's profile picture
Younggyo Seo1 year ago

[5/N] Our goal is not to claim novelty -- we deliberately built a simple algorithm that just works without bells and whistles. Excited to see what the community builds on top of it! Check below as well:

Younggyo Seo's profile picture
Younggyo Seo1 year ago

[6/N] This work wouldn’t have been possible without my collaborators @carlo_sferrazza @HaoranGeng2 @mic_nau @zhaohengyin @pabbeel Page: Arxiv:

Bytescribe's profile picture
Bytescribe1 year ago

Introducing Vehrbal, the AI that converts audio into SOAP notes! Say goodbye to wasted time and hello to effortless note-taking. Experience the power of fast, simple, and efficient with Vehrbal today.

Seohong Park's profile picture
Seohong Park1 year ago

I liked this work, congrats on the release!

Younggyo Seo's profile picture
Younggyo Seo1 year ago

Thank you Seohong! Let's see which of these observations will also hold in offline RL

Kyle🤖🚀🦭's profile picture
Kyle🤖🚀🦭1 year ago

do you intend to PR this to any larger RL libs / would you support others efforts to do so if not? (with appropriate citation of course)

Gracjan Góral's profile picture
Gracjan Góral1 year ago

love the robotic moonwalk hahah

Related Videos