Loading video...
Video Failed to Load
We now have full end-to-end manipulation autonomy for more complicated tasks. Here's a demonstration of Apollo leveraging AI to make a fresh cup of juice in an end-to-end manner for #GTC2024. A thread ๐งต
18,053 views โข 2 years ago โขvia X (Twitter)
6 Comments

Breaking down what you see in the video. Behaviors are learned from a teleoperated human demonstrator. We process the camera, hand pose, and grasp state to train a library of closed-loop dexterous manipulation behaviors. A task executor then decides which behavior to run.

All behaviors are driven by NN visuomotor policies. Lower-level motion planners and controllers are then used to track the policy, handle transitions, and maintain posture. This strong behavioral baseline is a foundational need for more general purpose logic and task execution.

Astute viewers may notice that we don't need to return to a "home" position with the arms to start a new action. It's something you may notice on other robots. Instead, our lower level planning/control handles l blending between behaviors to minimize human demonstrations per task

As we already saw with @Figure_robot. You can do some incredible tasks when you have language models to interpret action execution. While we don't demo voice/text interactions, we are closer to it than some may give credit thanks to the recent progress in multimodal LLM'S.

To that end, we look forward to seeing how partnering with @nvidia for Project GROOT will help us continue our goal of creating general purpose AI.

Maybe try teach robots how to use chopsticks next time? Which many Asian parents struggled when teaching their young child. Hahah
