Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

📢 First contact between a frontier model and robots! Gemini Robotics is a SOTA generalist Vision-Language-Action model bringing frontier model intelligence to the physical world. It's an extremely capable model enabling dexterous, steerable, and general robot control. 🧵⬇️

152,413 görüntüleme • 1 yıl önce •via X (Twitter)

11 Yorum

Ted Xiao profil fotoğrafı
Ted Xiao1 yıl önce

Solving robotics requires comprehensively understanding the physical world. Multimodal models are a critical piece of this puzzle, so we start from the strongest one -- Gemini 2.0. We verify Gemini's real-world knowledge with our new Embodied Reasoning QA (ERQA) benchmark. (2/9)

Ted Xiao profil fotoğrafı
Ted Xiao1 yıl önce

Gemini Robotics-ER (Embodied Reasoning) is an extension of Gemini 2.0 with enhanced spatial and temporal understanding. This unlocks fundamental capabilities like pointing, multiview and 3D understanding, and grasp prediction, ready for direct use in robot applications. (3/9)

Ted Xiao profil fotoğrafı
Ted Xiao1 yıl önce

Building on top of embodied reasoning capabilities, we introduce Gemini Robotics, a new SOTA Vision-Language-Action (VLA) model. It's a very strong generalist model out of the box, able to perform very dexterous tasks while still following instructions and generalizing 🚀 (4/9)

Ted Xiao profil fotoğrafı
Ted Xiao1 yıl önce

Gemini Robotics is already a breakthrough pre-trained generalist VLA, but that's not all! We also show how it can easily be specialized for long-horizon dexterity, advanced reasoning, fast adaption to new tasks, and transfer to entirely new robot embodiments like humanoids. (5/9)

Ted Xiao profil fotoğrafı
Ted Xiao1 yıl önce

It's hard to convey how magical it is to experience the combination of a general VLA action policy, robust embodied reasoning world knowledge, and native multimodality from Gemini 2.0 -- the interactivity and cohesive integration of these just make sense. (6/9)

Ted Xiao profil fotoğrafı
Ted Xiao1 yıl önce

Explore the technical details in our report, check out the blog post, and our open-sourced ERQA benchmark! (7/9) 📄 Tech report: 📰Blog post: 📊 ERQA Benchmark: 🧵 Thread:

Ted Xiao profil fotoğrafı
Ted Xiao1 yıl önce

This project was a huge team effort, a year and a half in the making! It's been so much fun across the whole stack: from fundamental frontier model multimodal capabilities, to advancing embodied reasoning, all the way to on-robot low-level control. (8/9)

Ted Xiao profil fotoğrafı
Ted Xiao1 yıl önce

Too many amazing collaborators to tag everyone, but special shoutout to @xf1280 @jackyliang42 @ColinearDevin @Stacormed @shahdhruv_ @alexleegk @du_yilun @TianliDing @claudiofantacci @ashwinb96 @SudeepDasari @TonyWentaoYuan @lgraesser3 @SeanKirmani @montseglz @keerthanpg @Sumeet_Robotics @DorsaSadigh @JieTan42707141 @Kanishka_Rao @vikassindhwani (9/9)

ARK Electronics profil fotoğrafı
ARK Electronics2 yıl önce

Excited about the latest tech for your drone product? Our NDAA-compliant, US-made flight controllers are designed to accelerate your path to market and provide a solid platform for developing your autonomous software. Check them out! #Drones #UAV #UAS #Robotics #MadeInUSA

Remi Cadene profil fotoğrafı
Remi Cadene1 yıl önce

Really cool! do you plan to open source the data or model?

Ted Xiao profil fotoğrafı
Ted Xiao1 yıl önce

A lot of really exciting updates are coming… best is yet to come, stay tuned 🚀

Benzer Videolar