正在加载视频...

视频加载失败

Sergey has a new habit. He talks to Gemini Live while driving, discussing things like data center power and cost. It’s classic Google dogfooding — obsessively testing your own product. It reminds me of Bill Gates removing his car radio so he could think about Microsoft nonstop. Every founder...

652,645 次观看 • 6 个月前 •via X (Twitter)

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Learning from Human Demonstrations: Show the Robot How to Act! The pipeline is very similar to older experiments using Gemini & pi0 with LeRobot. Pi-zero runs locally, while Gemini Flash generates the affordances and the high-level task. (More details are in the thread.) The new component is learning from demonstrations via Gemini 2.5 Pro. I capture a video while demoing & take one of the last frames. Gemini 2.5 Pro then extracts the instructions & passes them to Gemini Flash to process the scene. The fun part is that there's no fancy insight that came from me; other than the days spent figuring out the right prompts. It's the bitter lesson hitting you in the face -> Enhanced Gemini capabilities make this possible. For example, Gemini Flash cannot do Russian doll stacking, but Gemini 2.5 Pro can do it consistently. The current limitation is low-level manipulation: - As you can see, I'm aligning the objects so they are easy to grasp using the same technique from the training data. I couldn't get Gemini Flash to consistently output an accurate grasping angle, and Gemini 1.5 Pro was too expensive and slow for real-time deployment. - Getting a symmetrical gripper should also help a lot. Adding rubber to the tips would probably also help prevent objects from slipping. Collecting & curating the data was the most time consuming & labor intensive part. Next, to improve low-level manipulation and make the system more real-time, I'm shifting to focus more on sims & synthetic data. This aligns better with my core competence. I'm open to tips and suggestions.

Shreyas Gite

22,480 次观看 • 1 年前