Tongzhou Mu 🤖🦾🦿's banner
Tongzhou Mu 🤖🦾🦿's profile picture

Tongzhou Mu 🤖🦾🦿

@tongzhou_mu3,154 subscribers

Scaling video world models for 🤖 @RhodaAI | Built 1st version of ManiSkill https://t.co/NuQOXaNhud | Prev @UCSanDiego @1x_tech @NVIDIA @MSFTResearch

Shorts

🤔 How to fine-tune an Imitation Learning policy (e.g., Diffusion Policy, ACT) with RL? As an RL practitioner, I’ve been struggling with this problem for a while. Here’s why it’s tough: 1️⃣ Special designs (usually for multimodal action distributions) in modern IL models make them non-trivial to fine-tune by RL. 2️⃣ Large policy models + RL's poor sample efficiency = a nightmare But finally, we figured out a simple solution that works for any model architecture! 🌟 Check out our #ICLR2025 paper: “Policy Decorator: Model-Agnostic Online Refinement for Large Policy Models”, led by my amazing mentee Xiu Yuan. 🔗 🧵 Read more below!

🤔 How to fine-tune an Imitation Learning policy (e.g., Diffusion Policy, ACT) with RL? As an RL practitioner, I’ve been struggling with this problem for a while. Here’s why it’s tough: 1️⃣ Special designs (usually for multimodal action distributions) in modern IL models make them non-trivial to fine-tune by RL. 2️⃣ Large policy models + RL's poor sample efficiency = a nightmare But finally, we figured out a simple solution that works for any model architecture! 🌟 Check out our #ICLR2025 paper: “Policy Decorator: Model-Agnostic Online Refinement for Large Policy Models”, led by my amazing mentee Xiu Yuan. 🔗 🧵 Read more below!

16,923 views

Videos

No more content to load