Loading video...

Video Failed to Load

Go Home

๐Ÿ“ข First contact between a frontier model and robots! Gemini Robotics is a SOTA generalist Vision-Language-Action model bringing frontier model intelligence to the physical world. It's an extremely capable model enabling dexterous, steerable, and general robot control. ๐Ÿงตโฌ‡๏ธ

152,413 views โ€ข 1 year ago โ€ขvia X (Twitter)

11 Comments

Ted Xiao's profile picture
Ted Xiao1 year ago

Solving robotics requires comprehensively understanding the physical world. Multimodal models are a critical piece of this puzzle, so we start from the strongest one -- Gemini 2.0. We verify Gemini's real-world knowledge with our new Embodied Reasoning QA (ERQA) benchmark. (2/9)

Ted Xiao's profile picture
Ted Xiao1 year ago

Gemini Robotics-ER (Embodied Reasoning) is an extension of Gemini 2.0 with enhanced spatial and temporal understanding. This unlocks fundamental capabilities like pointing, multiview and 3D understanding, and grasp prediction, ready for direct use in robot applications. (3/9)

Ted Xiao's profile picture
Ted Xiao1 year ago

Building on top of embodied reasoning capabilities, we introduce Gemini Robotics, a new SOTA Vision-Language-Action (VLA) model. It's a very strong generalist model out of the box, able to perform very dexterous tasks while still following instructions and generalizing ๐Ÿš€ (4/9)

Ted Xiao's profile picture
Ted Xiao1 year ago

Gemini Robotics is already a breakthrough pre-trained generalist VLA, but that's not all! We also show how it can easily be specialized for long-horizon dexterity, advanced reasoning, fast adaption to new tasks, and transfer to entirely new robot embodiments like humanoids. (5/9)

Ted Xiao's profile picture
Ted Xiao1 year ago

It's hard to convey how magical it is to experience the combination of a general VLA action policy, robust embodied reasoning world knowledge, and native multimodality from Gemini 2.0 -- the interactivity and cohesive integration of these just make sense. (6/9)

Ted Xiao's profile picture
Ted Xiao1 year ago

Explore the technical details in our report, check out the blog post, and our open-sourced ERQA benchmark! (7/9) ๐Ÿ“„ Tech report: ๐Ÿ“ฐBlog post: ๐Ÿ“Š ERQA Benchmark: ๐Ÿงต Thread:

Ted Xiao's profile picture
Ted Xiao1 year ago

This project was a huge team effort, a year and a half in the making! It's been so much fun across the whole stack: from fundamental frontier model multimodal capabilities, to advancing embodied reasoning, all the way to on-robot low-level control. (8/9)

Ted Xiao's profile picture
Ted Xiao1 year ago

Too many amazing collaborators to tag everyone, but special shoutout to @xf1280 @jackyliang42 @ColinearDevin @Stacormed @shahdhruv_ @alexleegk @du_yilun @TianliDing @claudiofantacci @ashwinb96 @SudeepDasari @TonyWentaoYuan @lgraesser3 @SeanKirmani @montseglz @keerthanpg @Sumeet_Robotics @DorsaSadigh @JieTan42707141 @Kanishka_Rao @vikassindhwani (9/9)

ARK Electronics's profile picture
ARK Electronics2 years ago

Excited about the latest tech for your drone product? Our NDAA-compliant, US-made flight controllers are designed to accelerate your path to market and provide a solid platform for developing your autonomous software. Check them out! #Drones #UAV #UAS #Robotics #MadeInUSA

Remi Cadene's profile picture
Remi Cadene1 year ago

Really cool! do you plan to open source the data or model?

Ted Xiao's profile picture
Ted Xiao1 year ago

A lot of really exciting updates are comingโ€ฆ best is yet to come, stay tuned ๐Ÿš€

Related Videos