Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵

Younggyo Seo

1,607 subscribers

130,935 views • 1 year ago •via X (Twitter)

Science & Technology Education

Anya Rossi• Live Now

Private livecam show

11 Comments

Younggyo Seo1 year ago

[1/N] FastTD3 is fast -- it solves humanoid tasks faster than PPO and other off-policy RL algorithms. But it’s simple: just TD3 agent + massively parallel simulation, a giant batch size, and a distributional critic (like PQL (@softraeh), but even simpler)

Younggyo Seo1 year ago

[2/N] FastTD3 isn’t just fast -- it’s capable. It solves challenging HumanoidBench tasks with dexterous hands in under 3 hours, and solves tasks from MuJoCo Playground and IsaacLab with ease.

Younggyo Seo1 year ago

[3/N] We even did zero-shot sim-to-real transfer to the Booster T1 humanoid -- directly from the amazingly easy-to-use MuJoCo Playground! (@kevin_zakka, @boosterobotics)

Younggyo Seo1 year ago

[4/N] Why does this work? (i) Parallel simulation offsets TD3’s exploration weakness via diverse data (ii) Large batches + distributional critic stabilize value learning Then the deterministic policy gradient just does its job: exploit.

Younggyo Seo1 year ago

[5/N] Our goal is not to claim novelty -- we deliberately built a simple algorithm that just works without bells and whistles. Excited to see what the community builds on top of it! Check below as well:

Younggyo Seo1 year ago

[6/N] This work wouldn’t have been possible without my collaborators @carlo_sferrazza @HaoranGeng2 @mic_nau @zhaohengyin @pabbeel Page: Arxiv:

Bytescribe1 year ago

Introducing Vehrbal, the AI that converts audio into SOAP notes! Say goodbye to wasted time and hello to effortless note-taking. Experience the power of fast, simple, and efficient with Vehrbal today.

Seohong Park1 year ago

I liked this work, congrats on the release!

Younggyo Seo1 year ago

Thank you Seohong! Let's see which of these observations will also hold in offline RL

Kyle🤖🚀🦭1 year ago

do you intend to PR this to any larger RL libs / would you support others efforts to do so if not? (with appropriate citation of course)

Gracjan Góral1 year ago

love the robotic moonwalk hahah

Related Videos

PNDBotics’ humanoid robot Adam tackles difficult terrain using an RL-based control policy.

PNDBotics’ humanoid robot Adam tackles difficult terrain using an RL-based control policy.

The Humanoid Hub

26,656 views • 1 year ago

Excited to share a new humanoid robot platform we’ve been working on. Berkeley Humanoid is a reliable and low-cost mid-scale research platform for learning-based control. We demonstrate the robot walking on various terrains and dynamic hopping with a simple RL controller.

Excited to share a new humanoid robot platform we’ve been working on. Berkeley Humanoid is a reliable and low-cost mid-scale research platform for learning-based control. We demonstrate the robot walking on various terrains and dynamic hopping with a simple RL controller.

Qiayuan Liao

80,294 views • 1 year ago

Tired of waiting hours for humanoids to learn to walk? Our new technical report shows how to train sim-to-real humanoid locomotion in 15 minutes with FastSAC and FastTD3! The full pipeline is open-source in the newly released Holosoma codebase. Thread 🧵

Tired of waiting hours for humanoids to learn to walk? Our new technical report shows how to train sim-to-real humanoid locomotion in 15 minutes with FastSAC and FastTD3! The full pipeline is open-source in the newly released Holosoma codebase. Thread 🧵

Younggyo Seo

36,408 views • 6 months ago

PNDbotics’ humanoid Adam now walks more naturally and efficiently thanks to RL-based control.

PNDbotics’ humanoid Adam now walks more naturally and efficiently thanks to RL-based control.

The Humanoid Hub

46,003 views • 1 year ago

Engineers at XPENG developed an RL pipeline to achieve a natural walking gait for the IRON humanoid, tailoring both the data and the algorithm to adapt to the stiffness of its lattice skin.

Engineers at XPENG developed an RL pipeline to achieve a natural walking gait for the IRON humanoid, tailoring both the data and the algorithm to adapt to the stiffness of its lattice skin.

The Humanoid Hub

13,827 views • 4 months ago

My 10-week independent project for the Northwestern University MSR (MS in Robotics) program: Northwestern Humanoid. In short, I built the humanoid robot's hardware, integrated its motors and sensors with ROS, and successfully simulated its walking using an RL policy. #Robotics

My 10-week independent project for the Northwestern University MSR (MS in Robotics) program: Northwestern Humanoid. In short, I built the humanoid robot's hardware, integrated its motors and sensors with ROS, and successfully simulated its walking using an RL policy. #Robotics

Lele Burger

37,740 views • 1 year ago

Excited to open-source GMR: General Motion Retargeting. Real-time human-to-humanoid retargeting on your laptop. Supports diverse motion formats & robots. Unlock whole-body humanoid teleoperation (e.g., TWIST). video with 🔊

Excited to open-source GMR: General Motion Retargeting. Real-time human-to-humanoid retargeting on your laptop. Supports diverse motion formats & robots. Unlock whole-body humanoid teleoperation (e.g., TWIST). video with 🔊

Yanjie Ze

88,183 views • 10 months ago

Eric Jang weighs in on RL vs. supervised learning for humanoid robot manipulation tasks.

Eric Jang weighs in on RL vs. supervised learning for humanoid robot manipulation tasks.

The Humanoid Hub

10,656 views • 11 months ago

Deployment-Ready RL: Pitfalls, Lessons, and Best Practices We’ve published a full transcript of a webinar with Kyle🤖🚀🦭 (UT Austin) and the Humanoid team on our blog. He explores Sim2Real RL: - Action Space - Observation Space - Dealing with Model Mismatch - Reward Tuning Intuition - RL with Motion References We’re sharing this to spread knowledge & help push humanoid robotics forward. Read or listen here:

Deployment-Ready RL: Pitfalls, Lessons, and Best Practices We’ve published a full transcript of a webinar with Kyle🤖🚀🦭 (UT Austin) and the Humanoid team on our blog. He explores Sim2Real RL: - Action Space - Observation Space - Dealing with Model Mismatch - Reward Tuning Intuition - RL with Motion References We’re sharing this to spread knowledge & help push humanoid robotics forward. Read or listen here:

Humanoid

60,003 views • 9 months ago

Since everyone is talking about RL Environments and GRPO now but no one knows how it works we thought it would be cool to make an explainer video + code you can run: This is an example of using GRPO to train Qwen 2.5 to play 2048 (code in thread) 🧵:

Since everyone is talking about RL Environments and GRPO now but no one knows how it works we thought it would be cool to make an explainer video + code you can run: This is an example of using GRPO to train Qwen 2.5 to play 2048 (code in thread) 🧵:

Jay

151,776 views • 9 months ago

HDMI (HumanoiD iMitation for Interaction) is a framework enabling humanoid robots to learn whole-body object interaction skills from monocular RGB human videos. It extracts and retargets human poses and object trajectories using GVHMR and LocoMujoco, building reference datasets with contact annotations. The data is used to train an RL policy via robot-object co-tracking. HDMI achieved 67 consecutive door traversals.

HDMI (HumanoiD iMitation for Interaction) is a framework enabling humanoid robots to learn whole-body object interaction skills from monocular RGB human videos. It extracts and retargets human poses and object trajectories using GVHMR and LocoMujoco, building reference datasets with contact annotations. The data is used to train an RL policy via robot-object co-tracking. HDMI achieved 67 consecutive door traversals.

The Humanoid Hub

17,395 views • 8 months ago

[Open Source] Unitree First View Teleoperation for Humanoid Robots In order to advance the convenience of data collection for humanoid robots, we refer to other solutions to do the adaptation development and open source. Github： #Unitree #Humanoid #AGI #AI #EmbodiedIntelligence #Manipulation #Teleoperation #DataCollection

[Open Source] Unitree First View Teleoperation for Humanoid Robots In order to advance the convenience of data collection for humanoid robots, we refer to other solutions to do the adaptation development and open source. Github： #Unitree #Humanoid #AGI #AI #EmbodiedIntelligence #Manipulation #Teleoperation #DataCollection

Unitree

116,509 views • 1 year ago

AGIBOT has launched a cloud platform that lets users upload raw videos of dance, martial arts or gestures. It captures the motion, trains an RL policy for the X2 humanoid. It also includes tools to sync voiceovers and facial expressions with choreography.

AGIBOT has launched a cloud platform that lets users upload raw videos of dance, martial arts or gestures. It captures the motion, trains an RL policy for the X2 humanoid. It also includes tools to sync voiceovers and facial expressions with choreography.

The Humanoid Hub

32,733 views • 7 months ago

Excited to share Flow Matching Policy Gradients: expressive RL policies trained from rewards using flow matching. It’s an easy, drop-in replacement for Gaussian PPO on control tasks.

Excited to share Flow Matching Policy Gradients: expressive RL policies trained from rewards using flow matching. It’s an easy, drop-in replacement for Gaussian PPO on control tasks.

David McAllister

150,389 views • 10 months ago

I've long wondered if we can make a humanoid robot do a 𝘄𝗮𝗹𝗹𝗳𝗹𝗶𝗽 - and we just made it happen by leveraging 𝗢𝗺𝗻𝗶𝗥𝗲𝘁𝗮𝗿𝗴𝗲𝘁 with BeyondMimic tracking! This came after our original OmniRetarget experiments, with only minor tweaks to RL training: relaxing a termination threshold and removing one reward term. The policy achieved a 𝟱/𝟱 success rate in our real-world experiments, showing the strength of high-quality, interaction-preserving motion retargeting combined with BeyondMimic’s minimal RL tracking. Here is the updated arXiv: (In Sec. V. A)

I've long wondered if we can make a humanoid robot do a 𝘄𝗮𝗹𝗹𝗳𝗹𝗶𝗽 - and we just made it happen by leveraging 𝗢𝗺𝗻𝗶𝗥𝗲𝘁𝗮𝗿𝗴𝗲𝘁 with BeyondMimic tracking! This came after our original OmniRetarget experiments, with only minor tweaks to RL training: relaxing a termination threshold and removing one reward term. The policy achieved a 𝟱/𝟱 success rate in our real-world experiments, showing the strength of high-quality, interaction-preserving motion retargeting combined with BeyondMimic’s minimal RL tracking. Here is the updated arXiv: (In Sec. V. A)

Zhen Wu

1,050,904 views • 8 months ago

In the last month, we’ve been building an open-source framework for robot learning and sim-to-real transfer, made for RL whole-body control from simple walking to complex human imitation Check out the details on HN: Get started in 5 minutes ⬇️

In the last month, we’ve been building an open-source framework for robot learning and sim-to-real transfer, made for RL whole-body control from simple walking to complex human imitation Check out the details on HN: Get started in 5 minutes ⬇️

K-Scale Labs

75,983 views • 1 year ago

Want to scale RL with your shiny new GPU? 🚀 In our ICML24 Oral we find that RL algorithms hit a barrier when data is scaled up. Our new algorithm, SAPG, proposes a simple fix. It scales to 25k envs and solves hard tasks where PPO makes no progress. 1/n

Want to scale RL with your shiny new GPU? 🚀 In our ICML24 Oral we find that RL algorithms hit a barrier when data is scaled up. Our new algorithm, SAPG, proposes a simple fix. It scales to 25k envs and solves hard tasks where PPO makes no progress. 1/n

Ananye Agarwal

64,403 views • 1 year ago

Whether teleoperated or autonomous, whole body movements like picking objects from the ground are challenging for humanoid robots. A UC San Diego paper introduces a framework that combines sim-to-real RL and trajectory optimization for adaptive, real-time whole body control.

Whether teleoperated or autonomous, whole body movements like picking objects from the ground are challenging for humanoid robots. A UC San Diego paper introduces a framework that combines sim-to-real RL and trajectory optimization for adaptive, real-time whole body control.

The Humanoid Hub

27,879 views • 1 year ago

Time to democratize humanoid robots! Introducing ToddlerBot, a low-cost ($6K), open-source humanoid for robotics and AI research. Watch two ToddlerBots seamlessly chain their loco-manipulation skills to collaborate in tidying up after a toy session.

Time to democratize humanoid robots! Introducing ToddlerBot, a low-cost ($6K), open-source humanoid for robotics and AI research. Watch two ToddlerBots seamlessly chain their loco-manipulation skills to collaborate in tidying up after a toy session.

Haochen Shi

113,369 views • 1 year ago

If you have a policy that uses diffusion/flow (e.g. diffusion VLA), you can run RL where the actor chooses the noise, which is then denoised by the policy to produce an action. This method, which we call diffusion steering (DSRL), leads to a remarkably efficient RL method! 🧵👇

If you have a policy that uses diffusion/flow (e.g. diffusion VLA), you can run RL where the actor chooses the noise, which is then denoised by the policy to produce an action. This method, which we call diffusion steering (DSRL), leads to a remarkably efficient RL method! 🧵👇

Sergey Levine

152,824 views • 11 months ago