Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Gen2Act: Casting language-conditioned manipulation as *human video generation* followed by *closed-loop policy execution conditioned on the generated video* enables solving diverse real-world tasks unseen in the robot dataset! 1/n

71,132 görüntüleme • 1 yıl önce •via X (Twitter)

10 Yorum

Homanga Bharadhwaj profil fotoğrafı
Homanga Bharadhwaj1 yıl önce

We opt for generating human videos because we find that current best video models (e.g. VideoPoet) are already good at generating human videos *zero-shot* given an image of a scene and a language description of a task. This doesn't require any fine-tuning/adaption! 2/n

Homanga Bharadhwaj profil fotoğrafı
Homanga Bharadhwaj1 yıl önce

The video model generalizes well to new scenarios by virtue of web-scale training The policy also generalizes to tasks beyond that in the robot data as it is tasked with a much simpler job of translating the generated video to actions by following motion cues from the video 3/n

Homanga Bharadhwaj profil fotoğrafı
Homanga Bharadhwaj1 yıl önce

We can also chain Gen2Act for long-horizon activities with multiple tasks by sequentially rolling out video generation and policy execution conditioned on the generated video. 4/n

Homanga Bharadhwaj profil fotoğrafı
Homanga Bharadhwaj1 yıl önce

Following prior works, we categorize results with respect to different levels of generalization. Gen2Act achieves non-trivial success rates (30-60%) for even the challenging categories of motion-type and object-type generalization 5/n

Homanga Bharadhwaj profil fotoğrafı
Homanga Bharadhwaj1 yıl önce

This was a fun project w/ @debidatta @gupta_abhinav_ @shubhtuls @CarlDoersch @shahdhruv_ @xiao_ted @SeanKirmani @xf1280 @DorsaSadigh @GoogleDeepMind @CMU_Robotics @StanfordAILab More details: Video: n/n

Samarth Sinha profil fotoğrafı
Samarth Sinha1 yıl önce

Congrats Homanga!!

Jason Ma profil fotoğrafı
Jason Ma1 yıl önce

Excited to see this out, congrats Homanga!

Paweł Budzianowski profil fotoğrafı
Paweł Budzianowski1 yıl önce

Great to see first video-based model employed! This opens up completely new category of possibilities!

Jay Vakil profil fotoğrafı
Jay Vakil1 yıl önce

Amazing work @mangahomanga

Rui Chen profil fotoğrafı
Rui Chen1 yıl önce

Great work! Human video is a useful and unlimited source for mainpulation.

Benzer Videolar