
Paul Zhou
@zhiyuan_zhou_ • 1,415 subscribers
RL & robots. phd @berkeley_ai, prev @physical_int
Videos

Do you ever find finetuning VLA overfits to the target task, to the point where generalist ability is lost and even minor deviations beyond the SFT data break the policy? We found an extremely simple solution: directly merge the base and finetuned policy in weight space 🤯 👇🧵
Paul Zhou126,621 görüntüleme • 5 ay önce
Daha fazla içerik yok.