Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

🎤🎤 Excited to introduce COME-robot🤖🤖, Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V. It is the first closed-loop framework utilizing the vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios. COME-robot demonstrates a significant improvement in task success rate (~25%) compared to SOTA methods. Project: Arxiv:

22,291 Aufrufe • vor 2 Jahren •via X (Twitter)

6 Kommentare

Profilbild von Siyuan Huang
Siyuan Huangvor 2 Jahren

(1/4) Given a task instruction, COME-robot employs GPT-4V for reasoning and generates a code-based plan. Through feedback obtained from the robot's execution and interaction with the environment, it iteratively updates the subsequent plan or recovers from failures, ultimately accomplishing the given task.

Profilbild von Siyuan Huang
Siyuan Huangvor 2 Jahren

(2/4) The unique properties of COME-robot: Active Perception, Situated Commonsense Reasoning, and Recover from Failure.

Profilbild von Siyuan Huang
Siyuan Huangvor 2 Jahren

(3/4)Some trails of mobile and tabletop manipulation, including these ones recovering from failures. The objects on the table are randomly permutated after each trail.

Profilbild von Siyuan Huang
Siyuan Huangvor 2 Jahren

(4/4) The VLM can provide helpful feedback for visual feedback errors, grasp failures, wrong detection, etc. The following are some examples

Profilbild von jack
jackvor 2 Jahren

cool,did you build yourself the robot or buy it?

Profilbild von Markus Heimerl
Markus Heimerlvor 2 Jahren

I think this is the way we are going to head towards. The foundation models are incredible generalizers and it does not make sense to try to train a robotic perception model or develop a planning algorithm yourself, if the visual foundation model is one API call away. Nice work!

Ähnliche Videos