Загрузка видео...

Не удалось загрузить видео

На главную

📢 First contact between a frontier model and robots! Gemini Robotics is a SOTA generalist Vision-Language-Action model bringing frontier model intelligence to the physical world. It's an extremely capable model enabling dexterous, steerable, and general robot control. 🧵⬇️

152,413 просмотров • 1 год назад •via X (Twitter)

Комментарии: 11

Фото профиля Ted Xiao
Ted Xiao1 год назад

Solving robotics requires comprehensively understanding the physical world. Multimodal models are a critical piece of this puzzle, so we start from the strongest one -- Gemini 2.0. We verify Gemini's real-world knowledge with our new Embodied Reasoning QA (ERQA) benchmark. (2/9)

Фото профиля Ted Xiao
Ted Xiao1 год назад

Gemini Robotics-ER (Embodied Reasoning) is an extension of Gemini 2.0 with enhanced spatial and temporal understanding. This unlocks fundamental capabilities like pointing, multiview and 3D understanding, and grasp prediction, ready for direct use in robot applications. (3/9)

Фото профиля Ted Xiao
Ted Xiao1 год назад

Building on top of embodied reasoning capabilities, we introduce Gemini Robotics, a new SOTA Vision-Language-Action (VLA) model. It's a very strong generalist model out of the box, able to perform very dexterous tasks while still following instructions and generalizing 🚀 (4/9)

Фото профиля Ted Xiao
Ted Xiao1 год назад

Gemini Robotics is already a breakthrough pre-trained generalist VLA, but that's not all! We also show how it can easily be specialized for long-horizon dexterity, advanced reasoning, fast adaption to new tasks, and transfer to entirely new robot embodiments like humanoids. (5/9)

Фото профиля Ted Xiao
Ted Xiao1 год назад

It's hard to convey how magical it is to experience the combination of a general VLA action policy, robust embodied reasoning world knowledge, and native multimodality from Gemini 2.0 -- the interactivity and cohesive integration of these just make sense. (6/9)

Фото профиля Ted Xiao
Ted Xiao1 год назад

Explore the technical details in our report, check out the blog post, and our open-sourced ERQA benchmark! (7/9) 📄 Tech report: 📰Blog post: 📊 ERQA Benchmark: 🧵 Thread:

Фото профиля Ted Xiao
Ted Xiao1 год назад

This project was a huge team effort, a year and a half in the making! It's been so much fun across the whole stack: from fundamental frontier model multimodal capabilities, to advancing embodied reasoning, all the way to on-robot low-level control. (8/9)

Фото профиля Ted Xiao
Ted Xiao1 год назад

Too many amazing collaborators to tag everyone, but special shoutout to @xf1280 @jackyliang42 @ColinearDevin @Stacormed @shahdhruv_ @alexleegk @du_yilun @TianliDing @claudiofantacci @ashwinb96 @SudeepDasari @TonyWentaoYuan @lgraesser3 @SeanKirmani @montseglz @keerthanpg @Sumeet_Robotics @DorsaSadigh @JieTan42707141 @Kanishka_Rao @vikassindhwani (9/9)

Фото профиля ARK Electronics
ARK Electronics2 лет назад

Excited about the latest tech for your drone product? Our NDAA-compliant, US-made flight controllers are designed to accelerate your path to market and provide a solid platform for developing your autonomous software. Check them out! #Drones #UAV #UAS #Robotics #MadeInUSA

Фото профиля Remi Cadene
Remi Cadene1 год назад

Really cool! do you plan to open source the data or model?

Фото профиля Ted Xiao
Ted Xiao1 год назад

A lot of really exciting updates are coming… best is yet to come, stay tuned 🚀

Похожие видео