Video wird geladen...
Video konnte nicht geladen werden
🚀 First step to unlocking Generalist Robots! Introducing 🤖LAPA🤖, a new SOTA open-sourced 7B VLA pretrained without using action labels. 💪SOTA VLA trained with Open X (outperforming OpenVLA on cross and multi embodiment) 😯LAPA enables learning from human videos, unlocking potential for robotic foundation model ❗Over 30x pretraining efficiency... show more
33,239 Aufrufe • vor 1 Jahr •via X (Twitter)
9 Kommentare

LAPA consists of a 1) Latent Action Quantization and 2) Latent Pretraining stage. The first stage learns quantized actions through visual deltas. For the second stage, a pretrained VLM (LWM) is trained to predict the quantized latent actions. During finetuning, we map the latent actions to real actions.

LAPA beats OpenVLA across cross- and multi-embodiment tasks, all without using action labels during pretraining! 🚀 A step forward in robust, generalizable robot learning.

We can built LAPA from 220K human videos where the action labels does not exist and the embodiment gap is huge. Still, LAPA (Human Videos) outperform OpenVLA (Bridge) 😯

What do latent actions mean? Latent actions correspond to 🌟semantic 🌟actions across different robot embodiments. Interestingly, despite different robot embodiments, the same latent action maps to similar movements. This suggests latent actions form a ‘shared’ representation space, much like images or language.

We also do closed-loop rollout of LAPA for analysis (rollout vs ground truth). Given ‘pick up the broccoli from pot,’ it successfully picks up the object, which then disappears. This highlights LAPA’s potential as an emerging 'world model' with impressive predictive abilities! 🌍

All of the things are open-sourced! Code 💻: Huggingface 🤗: Website🌐:

Co-led with @jang_yoel With wonderful collaborators: Byeongguk Jeon, @joocjun ,@jw2yang4ai Baolin Peng, @AjayMandlekar, Reuben Tan, Yu-Wei Chao, @billyuchenlin, Lars Liden And advisors: @kimin_le2, @JianfengGao0217, @LukeZettlemoyer, Dieter Fox, @seo_minjoon from @kaist_ai, @UW, @Microsoft, @nvidia, @allen_ai

Firm step towards scaling up VLA model!Excellent job!

Incredible job! I view it as the first practical evidence that unsupervised latent action could work well at such a large scale.
