正在加载视频...

视频加载失败

Chain-of-thought reasoning is a powerful tool to enable language models to work through complex problems. Can we use this with robots? With embodied chain-of-thought, vision-language-action (VLA) models can think through perception and planning! A 🧵👇

30,388 次观看 • 1 年前 •via X (Twitter)

9 条评论

Sergey Levine 的头像
Sergey Levine1 年前

Embodied chain of thought allows the VLA (in this case, a finetuned OpenVLA model) to work through a complex task by reasoning over subtasks, detecting objects, and making step-by-step plans. When generating an action, the VLA works through these steps automatically.

Sergey Levine 的头像
Sergey Levine1 年前

How do we train OpenVLA for embodied chain of thought? We distill a variety of other foundation models, such as Gemini and Grounding DINO, into synthetic examples that can teach the VLA to perform embodied chain of thought.

Sergey Levine 的头像
Sergey Levine1 年前

The resulting model can solve complex tasks that require multi stage inferences. It can generalize more effectively to novel objects, perform longer tasks, and understand sophisticated instructions.

Sergey Levine 的头像
Sergey Levine1 年前

The resulting VLA can even interpret human corrections and interventions, incorporating them as corrections into the embodied chain of thought process!

Sergey Levine 的头像
Sergey Levine1 年前

While our main experiments use the Bridge v2 setup: We also tested on a variety of other embodiments from OXE:

Sergey Levine 的头像
Sergey Levine1 年前

This was a really fun collaboration with @MiZawalski, @verityw_, @KarlPertsch, @oier_mees, @chelseabfinn Website: Paper:

Sergey Levine 的头像
Sergey Levine1 年前

For more, check out these posts by Michal and Will:

Alex 📚 PromptLeo 的头像
Alex 📚 PromptLeo1 年前

Thanks for sharing 👍 How do you create videos like these?

Joanne Mercado 的头像
Joanne Mercado1 年前

You’re making a lot of progress in robotics

相关视频