ๆญฃๅจๅ ่ฝฝ่ง้ข...
่ง้ขๅ ่ฝฝๅคฑ่ดฅ
โจ Introducing ๐๐ฉ๐๐ง๐๐๐ โ an open-source vision-language-action model for robotics! ๐ - SOTA generalist policy - 7B params - outperforms Octo, RT-2-X on zero-shot evals ๐ฆพ - trained on 970k episodes from OpenX dataset ๐ค - fully open: model/code/data all online ๐ค ๐งต๐
226,922 ๆฌก่ง็ โข 2 ๅนดๅ โขvia X (Twitter)
11 ๆก่ฏ่ฎบ

๐งต[2/9] OpenVLA generalizes better overall and shows stronger language grounding than prior SOTA generalist models โ RT-1-X, Octo, and even closed-source RT-2-X โ across a suite of 17 WidowX robot tasks + 12 Google robot tasks.

๐งต[3/9] OpenVLA can also be fully fine-tuned on new robot setups/tasks with just 10-150 demos and outperform from-scratch Diffusion Policy on diverse multi-instruction tasks with distractor objects in the scene.

๐งต[4/9] Additionally, OpenVLA can be fine-tuned via PEFT (LoRA) on a single 48GB GPU โ training only 1.4% of the parameters but still matching full fine-tuning performance on Franka Panda fine-tuning tasks.

๐งต[5/9] Further, by using 4-bit quantization at inference time, the OpenVLA model can be loaded with less than half the normal required GPU memory and complete BridgeData V2 WidowX tasks without compromising performance.

๐งต[6/9] How does OpenVLA work? TL;DR: We take a 7B-parameter Prismatic VLM โ with a fused DinoV2-SigLIP vision encoder and a Llama 2 LLM backbone โ and fine-tune it on a ton of robot action data. - nearly 1M robot episodes - almost 30 robotic manipulation datasets

๐งต[7/9] Unlike prior SOTA VLA model RT-2-X, we open-source our model, training & inference code, and OpenX training data mixture! ๐ค See all this and more info at our website! ๐ ๐

๐งต[8/9] OpenVLA is the *first* open-source VLM-based robotic foundation model trained on large-scale real-world robot manipulation data. We hope that our model and training frameworks are useful resources to the robot learning community that help advance embodied AI research!

๐งต[9/9] Huge thanks to project co-leads, @KarlPertsch and @siddkaramcheti, for making this project possible! โค๏ธ Also, so grateful for all my collaborators โ from @Stanford, @UCBerkeley, @MIT, @ToyotaResearch, @GoogleDeepMind, and @physical_int. ๐

Very impressive work! You might not realize that we have an open-source 3D-VLA, published at ICML this year ๐. Code:

Very cool, congrats!

@hausman_k Thank you!!
