
Paul Zhou
@zhiyuan_zhou_ • 1,415 subscribers
RL & robots. phd @berkeley_ai, prev @physical_int
Videos

Do you ever find finetuning VLA overfits to the target task, to the point where generalist ability is lost and even minor deviations beyond the SFT data break the policy? We found an extremely simple solution: directly merge the base and finetuned policy in weight space 🤯 👇🧵
Paul Zhou126,621 просмотров • 5 месяцев назад
Больше нет контента для загрузки