Tongzhou Mu 🤖🦾🦿's banner

Tongzhou Mu 🤖🦾🦿

@tongzhou_mu • 3,188 subscribers

Scaling video world models for 🤖 @RhodaAI | Built 1st version of ManiSkill https://t.co/NuQOXaNhud | Prev @UCSanDiego @1x_tech @NVIDIA @MSFTResearch

Shorts

🤔 How to fine-tune an Imitation Learning policy (e.g., Diffusion Policy, ACT) with RL? As an RL practitioner, I’ve been struggling with this problem for a while. Here’s why it’s tough: 1️⃣ Special designs (usually for multimodal action distributions) in modern IL models make them non-trivial to fine-tune by RL. 2️⃣ Large policy models + RL's poor sample efficiency = a nightmare But finally, we figured out a simple solution that works for any model architecture! 🌟 Check out our #ICLR2025 paper: “Policy Decorator: Model-Agnostic Online Refinement for Large Policy Models”, led by my amazing mentee Xiu Yuan. 🔗 🧵 Read more below!

🤔 How to fine-tune an Imitation Learning policy (e.g., Diffusion Policy, ACT) with RL? As an RL practitioner, I’ve been struggling with this problem for a while. Here’s why it’s tough: 1️⃣ Special designs (usually for multimodal action distributions) in modern IL models make them non-trivial to fine-tune by RL. 2️⃣ Large policy models + RL's poor sample efficiency = a nightmare But finally, we figured out a simple solution that works for any model architecture! 🌟 Check out our #ICLR2025 paper: “Policy Decorator: Model-Agnostic Online Refinement for Large Policy Models”, led by my amazing mentee Xiu Yuan. 🔗 🧵 Read more below!

16,959 просмотров

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Proud to share what I’ve been working on with my colleagues at Rhoda AI: Direct Video-Action Models (DVA). TL;DR: - We pre-train causal video models from scratch to control robots - They handle complex production tasks for hours without intervention - Only use ~10 hours of robot data How? 🧵👇

Proud to share what I’ve been working on with my colleagues at Rhoda AI: Direct Video-Action Models (DVA). TL;DR: - We pre-train causal video models from scratch to control robots - They handle complex production tasks for hours without intervention - Only use ~10 hours of robot data How? 🧵👇

Tongzhou Mu 🤖🦾🦿

19,102 просмотров • 4 месяцев назад

Больше нет контента для загрузки