正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐 - SOTA generalist policy - 7B params - outperforms Octo, RT-2-X on zero-shot evals 🦾 - trained on 970k episodes from OpenX dataset 🤖 - fully open: model/code/data all online 🤗 🧵👇

Moo Jin Kim

2,167 subscribers

226,922 次观看 • 2 年前 •via X (Twitter)

教育科学技术

Anya Rossi• Live Now

Private livecam show

11 条评论

Moo Jin Kim 的头像

Moo Jin Kim2 年前

🧵[2/9] OpenVLA generalizes better overall and shows stronger language grounding than prior SOTA generalist models — RT-1-X, Octo, and even closed-source RT-2-X — across a suite of 17 WidowX robot tasks + 12 Google robot tasks.

Moo Jin Kim 的头像

Moo Jin Kim2 年前

🧵[3/9] OpenVLA can also be fully fine-tuned on new robot setups/tasks with just 10-150 demos and outperform from-scratch Diffusion Policy on diverse multi-instruction tasks with distractor objects in the scene.

Moo Jin Kim 的头像

Moo Jin Kim2 年前

🧵[4/9] Additionally, OpenVLA can be fine-tuned via PEFT (LoRA) on a single 48GB GPU — training only 1.4% of the parameters but still matching full fine-tuning performance on Franka Panda fine-tuning tasks.

Moo Jin Kim 的头像

Moo Jin Kim2 年前

🧵[5/9] Further, by using 4-bit quantization at inference time, the OpenVLA model can be loaded with less than half the normal required GPU memory and complete BridgeData V2 WidowX tasks without compromising performance.

Moo Jin Kim 的头像

Moo Jin Kim2 年前

🧵[6/9] How does OpenVLA work? TL;DR: We take a 7B-parameter Prismatic VLM – with a fused DinoV2-SigLIP vision encoder and a Llama 2 LLM backbone – and fine-tune it on a ton of robot action data. - nearly 1M robot episodes - almost 30 robotic manipulation datasets

Moo Jin Kim 的头像

Moo Jin Kim2 年前

🧵[7/9] Unlike prior SOTA VLA model RT-2-X, we open-source our model, training & inference code, and OpenX training data mixture! 🤗 See all this and more info at our website! 👉 👈

Moo Jin Kim 的头像

Moo Jin Kim2 年前

🧵[8/9] OpenVLA is the *first* open-source VLM-based robotic foundation model trained on large-scale real-world robot manipulation data. We hope that our model and training frameworks are useful resources to the robot learning community that help advance embodied AI research!

Moo Jin Kim 的头像

Moo Jin Kim2 年前

🧵[9/9] Huge thanks to project co-leads, @KarlPertsch and @siddkaramcheti, for making this project possible! ❤️ Also, so grateful for all my collaborators – from @Stanford, @UCBerkeley, @MIT, @ToyotaResearch, @GoogleDeepMind, and @physical_int. 🙏

Chuang Gan 的头像

Chuang Gan2 年前

Very impressive work! You might not realize that we have an open-source 3D-VLA, published at ICML this year 😀. Code:

Karol Hausman 的头像

Karol Hausman2 年前

Very cool, congrats!

Moo Jin Kim 的头像

Moo Jin Kim2 年前

@hausman_k Thank you!!

相关视频

3 mo. ago we released the Open X-Embodiment dataset, today we’re doing the next step: Introducing Octo 🐙, a generalist robot policy, trained on 800k robot trajectories, stronger than RT-1X, flexible observation + action spaces, fully open source! 💻: /🧵

3 mo. ago we released the Open X-Embodiment dataset, today we’re doing the next step: Introducing Octo 🐙, a generalist robot policy, trained on 800k robot trajectories, stronger than RT-1X, flexible observation + action spaces, fully open source! 💻: /🧵

Karl Pertsch

126,658 次观看 • 2 年前

🚀 First step to unlocking Generalist Robots! Introducing 🤖LAPA🤖, a new SOTA open-sourced 7B VLA pretrained without using action labels. 💪SOTA VLA trained with Open X (outperforming OpenVLA on cross and multi embodiment) 😯LAPA enables learning from human videos, unlocking potential for robotic foundation model ❗Over 30x pretraining efficiency for VLA training 🤗Code and checkpoints are all open-sourced!

🚀 First step to unlocking Generalist Robots! Introducing 🤖LAPA🤖, a new SOTA open-sourced 7B VLA pretrained without using action labels. 💪SOTA VLA trained with Open X (outperforming OpenVLA on cross and multi embodiment) 😯LAPA enables learning from human videos, unlocking potential for robotic foundation model ❗Over 30x pretraining efficiency for VLA training 🤗Code and checkpoints are all open-sourced!

Seonghyeon Ye

33,239 次观看 • 1 年前

Introducing TraceVLA: a fully open-source Vision-Language-Action model reimagining spatial-temporal awareness: ✨ 3.5x gains on real robots, SOTA in simulation 💡 Fine-tunes on just 150K trajectories ⚡ Compact 4B model = 7B performance

Introducing TraceVLA: a fully open-source Vision-Language-Action model reimagining spatial-temporal awareness: ✨ 3.5x gains on real robots, SOTA in simulation 💡 Fine-tunes on just 150K trajectories ⚡ Compact 4B model = 7B performance

Yongyuan Liang

39,500 次观看 • 1 年前

Thrilled to announce Octo 🐙, an open-source robot foundation model! Octo is a sota generalist robot policy based on transformer+diffusion. Most importantly, you can finetune Octo *today* with flexible observation and action spaces on your robot setup!

Thrilled to announce Octo 🐙, an open-source robot foundation model! Octo is a sota generalist robot policy based on transformer+diffusion. Most importantly, you can finetune Octo today with flexible observation and action spaces on your robot setup!

Oier Mees

44,944 次观看 • 2 年前

1/5 🚀 Thrilled to open-source OSCAR 🤖 — an action-conditioned world model for robotics, led by the visiting student in my group Zhuoyuan Wu! It generalizes across different robot embodiments with precise action controllability. All trained on a single GH200 GPU, and outperforms existing open-sourced baselines, which have larger model capacity and need more compute. Everything is public, including training data. 📄 Paper: 🌐 Project: 💻 Code: 🤗 Robot data: 🤗 Human data: 🤗 Weights: #Robotics #WorldModels #AI #OpenSource

1/5 🚀 Thrilled to open-source OSCAR 🤖 — an action-conditioned world model for robotics, led by the visiting student in my group Zhuoyuan Wu! It generalizes across different robot embodiments with precise action controllability. All trained on a single GH200 GPU, and outperforms existing open-sourced baselines, which have larger model capacity and need more compute. Everything is public, including training data. 📄 Paper: 🌐 Project: 💻 Code: 🤗 Robot data: 🤗 Human data: 🤗 Weights: #Robotics #WorldModels #AI #OpenSource

Jun Gao

102,999 次观看 • 16 天前

Introducing Ψ₀ ( — an open foundation model for universal humanoid loco-manipulation. 🏆 Outperforms GR00T N1.6 by 40%+ overall success rate 📉 Uses only ~10% of the pre-training data 📦 Fully open-source: model, data, code, and deployment pipeline 1/10

Introducing Ψ₀ ( — an open foundation model for universal humanoid loco-manipulation. 🏆 Outperforms GR00T N1.6 by 40%+ overall success rate 📉 Uses only ~10% of the pre-training data 📦 Fully open-source: model, data, code, and deployment pipeline 1/10

Yue Wang

19,190 次观看 • 3 个月前

📢 First contact between a frontier model and robots! Gemini Robotics is a SOTA generalist Vision-Language-Action model bringing frontier model intelligence to the physical world. It's an extremely capable model enabling dexterous, steerable, and general robot control. 🧵⬇️

📢 First contact between a frontier model and robots! Gemini Robotics is a SOTA generalist Vision-Language-Action model bringing frontier model intelligence to the physical world. It's an extremely capable model enabling dexterous, steerable, and general robot control. 🧵⬇️

Ted Xiao

152,413 次观看 • 1 年前

Today, a step forward in open robotics - our results show that sim-to-real zero shot transfer for manipulation is possible. MolmoBot is our open model suite for robotics, trained entirely in simulation on MolmoSpaces.🧵

Today, a step forward in open robotics - our results show that sim-to-real zero shot transfer for manipulation is possible. MolmoBot is our open model suite for robotics, trained entirely in simulation on MolmoSpaces.🧵

Ai2

64,466 次观看 • 3 个月前

Most capable generalist robotics models today are closed or at best, open weights. But robotics won’t reach its ChatGPT moment without real openness. That GPT moment was built on years of open tools and datasets such as Python, PyTorch, ImageNet and more, that let researchers inspect, reproduce, and build. Today, we’re introducing MolmoAct 2: a fully open-source action reasoning model for real-world robotics. We rethought and reshaped everything! 🧵👇

Most capable generalist robotics models today are closed or at best, open weights. But robotics won’t reach its ChatGPT moment without real openness. That GPT moment was built on years of open tools and datasets such as Python, PyTorch, ImageNet and more, that let researchers inspect, reproduce, and build. Today, we’re introducing MolmoAct 2: a fully open-source action reasoning model for real-world robotics. We rethought and reshaped everything! 🧵👇

Jiafei Duan

105,282 次观看 • 1 个月前

Introducing a new, fully open robotics dataset! - 76k episodes - 564 unique scenes - 100 contributors - 13 labs/institutions - 3 continents A short 🧵 on the backstory

Introducing a new, fully open robotics dataset! - 76k episodes - 564 unique scenes - 100 contributors - 13 labs/institutions - 3 continents A short 🧵 on the backstory

Chelsea Finn

98,616 次观看 • 2 年前

Introducing OFT—an Optimized Fine-Tuning recipe for VLAs! Fine-tuning OpenVLA w/ OFT, we see: -25-50x faster inference ⚡️ -SOTA 97.1% avg SR in LIBERO 💪 -high-freq control w/ 7B model on real bimanual robot -outperforms π₀, RDT-1B, DiT Policy, MDT, Diffusion Policy, ACT 🧵👇

Introducing OFT—an Optimized Fine-Tuning recipe for VLAs! Fine-tuning OpenVLA w/ OFT, we see: -25-50x faster inference ⚡️ -SOTA 97.1% avg SR in LIBERO 💪 -high-freq control w/ 7B model on real bimanual robot -outperforms π₀, RDT-1B, DiT Policy, MDT, Diffusion Policy, ACT 🧵👇

Moo Jin Kim

84,187 次观看 • 1 年前

We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵

We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵

Karl Pertsch

111,325 次观看 • 1 年前

💥 A 450M model just beat bigger VLAs on real robot tasks, and it’s 100% open source [📍 bookmark for later] Came across SmolVLA, a new vision-language-action model for robotics that’s compact, fast, and trained entirely on open community datasets from LeRobot via Hugging Face. What stood out to me is how it matches or outperforms much larger models like ACT using noisy, real-world community data instead of giant private datasets. Why it’s worth a look ✅ 26% performance boost from pretraining on open-source data ✅ Runs on consumer hardware, even a MacBook ✅ 30% faster responses with async inference and smart architecture tweaks ✅ Strong results across Meta-World, LIBERO, SO100, and SO101 ✅ Fully open source: weights, code, training pipeline, eval stack They also introduced smart efficiency tricks like using fewer visual tokens, pulling outputs from mid-layer, and separating perception from action to make it all run fast. SmolVLA is a strong case for what can happen when the robotics community shares data and builds in the open. Definitely worth keeping an eye on.

💥 A 450M model just beat bigger VLAs on real robot tasks, and it’s 100% open source [📍 bookmark for later] Came across SmolVLA, a new vision-language-action model for robotics that’s compact, fast, and trained entirely on open community datasets from LeRobot via Hugging Face. What stood out to me is how it matches or outperforms much larger models like ACT using noisy, real-world community data instead of giant private datasets. Why it’s worth a look ✅ 26% performance boost from pretraining on open-source data ✅ Runs on consumer hardware, even a MacBook ✅ 30% faster responses with async inference and smart architecture tweaks ✅ Strong results across Meta-World, LIBERO, SO100, and SO101 ✅ Fully open source: weights, code, training pipeline, eval stack They also introduced smart efficiency tricks like using fewer visual tokens, pulling outputs from mid-layer, and separating perception from action to make it all run fast. SmolVLA is a strong case for what can happen when the robotics community shares data and builds in the open. Definitely worth keeping an eye on.

Ilir Aliu - eu/acc

17,353 次观看 • 10 个月前

Introducing the Open Deep Research app! Generate detailed reports on any topic with open source LLMs. Free & fully open source. We’re releasing everything: evaluation dataset, code, app, and blog.🔥

Introducing the Open Deep Research app! Generate detailed reports on any topic with open source LLMs. Free & fully open source. We’re releasing everything: evaluation dataset, code, app, and blog.🔥

Together AI

28,338 次观看 • 1 年前

✨✨ Announcing Open Source Dataset from excavators in real construction sites retrofitted by Flywheel (YC S25)! ✨✨ This 100hrs of observation+action data enables training autonomy models for excavators. We were able to train a small task model from 6 hours of dataset on a Kubota U17 on Y Combinator demo day! Link in comments!

✨✨ Announcing Open Source Dataset from excavators in real construction sites retrofitted by Flywheel (YC S25)! ✨✨ This 100hrs of observation+action data enables training autonomy models for excavators. We were able to train a small task model from 6 hours of dataset on a Kubota U17 on Y Combinator demo day! Link in comments!

Jash Mota

36,495 次观看 • 9 个月前

Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵

Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵

Ai2

85,809 次观看 • 2 个月前

MolmoAct 2 artifacts have been downloaded 400K+ times in under 1 month. Today we're opening up the full code & training data: everything you need to fine-tune or build on our fully open robotics foundation model. 🧵

MolmoAct 2 artifacts have been downloaded 400K+ times in under 1 month. Today we're opening up the full code & training data: everything you need to fine-tune or build on our fully open robotics foundation model. 🧵

Ai2

17,719 次观看 • 29 天前

1/ Introducing Glider - the smallest model to beat GPT-4o-mini on eval tasks ⚡🚀 - Open source, open weights, open code - Explainable evaluations by nature - Trained on 183 criteria and 685 domains Try it out for free at 🔥

1/ Introducing Glider - the smallest model to beat GPT-4o-mini on eval tasks ⚡🚀 - Open source, open weights, open code - Explainable evaluations by nature - Trained on 183 criteria and 685 domains Try it out for free at 🔥

PatronusAI

14,842 次观看 • 1 年前