Загрузка видео...

Не удалось загрузить видео

На главную

Gen2Act: Casting language-conditioned manipulation as *human video generation* followed by *closed-loop policy execution conditioned on the generated video* enables solving diverse real-world tasks unseen in the robot dataset! 1/n

71,132 просмотров • 1 год назад •via X (Twitter)

Комментарии: 10

Фото профиля Homanga Bharadhwaj
Homanga Bharadhwaj1 год назад

We opt for generating human videos because we find that current best video models (e.g. VideoPoet) are already good at generating human videos *zero-shot* given an image of a scene and a language description of a task. This doesn't require any fine-tuning/adaption! 2/n

Фото профиля Homanga Bharadhwaj
Homanga Bharadhwaj1 год назад

The video model generalizes well to new scenarios by virtue of web-scale training The policy also generalizes to tasks beyond that in the robot data as it is tasked with a much simpler job of translating the generated video to actions by following motion cues from the video 3/n

Фото профиля Homanga Bharadhwaj
Homanga Bharadhwaj1 год назад

We can also chain Gen2Act for long-horizon activities with multiple tasks by sequentially rolling out video generation and policy execution conditioned on the generated video. 4/n

Фото профиля Homanga Bharadhwaj
Homanga Bharadhwaj1 год назад

Following prior works, we categorize results with respect to different levels of generalization. Gen2Act achieves non-trivial success rates (30-60%) for even the challenging categories of motion-type and object-type generalization 5/n

Фото профиля Homanga Bharadhwaj
Homanga Bharadhwaj1 год назад

This was a fun project w/ @debidatta @gupta_abhinav_ @shubhtuls @CarlDoersch @shahdhruv_ @xiao_ted @SeanKirmani @xf1280 @DorsaSadigh @GoogleDeepMind @CMU_Robotics @StanfordAILab More details: Video: n/n

Фото профиля Samarth Sinha
Samarth Sinha1 год назад

Congrats Homanga!!

Фото профиля Jason Ma
Jason Ma1 год назад

Excited to see this out, congrats Homanga!

Фото профиля Paweł Budzianowski
Paweł Budzianowski1 год назад

Great to see first video-based model employed! This opens up completely new category of possibilities!

Фото профиля Jay Vakil
Jay Vakil1 год назад

Amazing work @mangahomanga

Фото профиля Rui Chen
Rui Chen1 год назад

Great work! Human video is a useful and unlimited source for mainpulation.

Похожие видео