Loading video...

Video Failed to Load

Go Home

Gen2Act: Casting language-conditioned manipulation as *human video generation* followed by *closed-loop policy execution conditioned on the generated video* enables solving diverse real-world tasks unseen in the robot dataset! 1/n

71,127 views • 1 year ago •via X (Twitter)

10 Comments

Homanga Bharadhwaj's profile picture
Homanga Bharadhwaj1 year ago

We opt for generating human videos because we find that current best video models (e.g. VideoPoet) are already good at generating human videos *zero-shot* given an image of a scene and a language description of a task. This doesn't require any fine-tuning/adaption! 2/n

Homanga Bharadhwaj's profile picture
Homanga Bharadhwaj1 year ago

The video model generalizes well to new scenarios by virtue of web-scale training The policy also generalizes to tasks beyond that in the robot data as it is tasked with a much simpler job of translating the generated video to actions by following motion cues from the video 3/n

Homanga Bharadhwaj's profile picture
Homanga Bharadhwaj1 year ago

We can also chain Gen2Act for long-horizon activities with multiple tasks by sequentially rolling out video generation and policy execution conditioned on the generated video. 4/n

Homanga Bharadhwaj's profile picture
Homanga Bharadhwaj1 year ago

Following prior works, we categorize results with respect to different levels of generalization. Gen2Act achieves non-trivial success rates (30-60%) for even the challenging categories of motion-type and object-type generalization 5/n

Homanga Bharadhwaj's profile picture
Homanga Bharadhwaj1 year ago

This was a fun project w/ @debidatta @gupta_abhinav_ @shubhtuls @CarlDoersch @shahdhruv_ @xiao_ted @SeanKirmani @xf1280 @DorsaSadigh @GoogleDeepMind @CMU_Robotics @StanfordAILab More details: Video: n/n

Samarth Sinha's profile picture
Samarth Sinha1 year ago

Congrats Homanga!!

Jason Ma's profile picture
Jason Ma1 year ago

Excited to see this out, congrats Homanga!

Paweł Budzianowski's profile picture
Paweł Budzianowski1 year ago

Great to see first video-based model employed! This opens up completely new category of possibilities!

Jay Vakil's profile picture
Jay Vakil1 year ago

Amazing work @mangahomanga

Rui Chen's profile picture
Rui Chen1 year ago

Great work! Human video is a useful and unlimited source for mainpulation.

Related Videos