正在加载视频...

视频加载失败

Gen2Act: Casting language-conditioned manipulation as *human video generation* followed by *closed-loop policy execution conditioned on the generated video* enables solving diverse real-world tasks unseen in the robot dataset! 1/n

71,132 次观看 • 1 年前 •via X (Twitter)

10 条评论

Homanga Bharadhwaj 的头像
Homanga Bharadhwaj1 年前

We opt for generating human videos because we find that current best video models (e.g. VideoPoet) are already good at generating human videos *zero-shot* given an image of a scene and a language description of a task. This doesn't require any fine-tuning/adaption! 2/n

Homanga Bharadhwaj 的头像
Homanga Bharadhwaj1 年前

The video model generalizes well to new scenarios by virtue of web-scale training The policy also generalizes to tasks beyond that in the robot data as it is tasked with a much simpler job of translating the generated video to actions by following motion cues from the video 3/n

Homanga Bharadhwaj 的头像
Homanga Bharadhwaj1 年前

We can also chain Gen2Act for long-horizon activities with multiple tasks by sequentially rolling out video generation and policy execution conditioned on the generated video. 4/n

Homanga Bharadhwaj 的头像
Homanga Bharadhwaj1 年前

Following prior works, we categorize results with respect to different levels of generalization. Gen2Act achieves non-trivial success rates (30-60%) for even the challenging categories of motion-type and object-type generalization 5/n

Homanga Bharadhwaj 的头像
Homanga Bharadhwaj1 年前

This was a fun project w/ @debidatta @gupta_abhinav_ @shubhtuls @CarlDoersch @shahdhruv_ @xiao_ted @SeanKirmani @xf1280 @DorsaSadigh @GoogleDeepMind @CMU_Robotics @StanfordAILab More details: Video: n/n

Samarth Sinha 的头像
Samarth Sinha1 年前

Congrats Homanga!!

Jason Ma 的头像
Jason Ma1 年前

Excited to see this out, congrats Homanga!

Paweł Budzianowski 的头像
Paweł Budzianowski1 年前

Great to see first video-based model employed! This opens up completely new category of possibilities!

Jay Vakil 的头像
Jay Vakil1 年前

Amazing work @mangahomanga

Rui Chen 的头像
Rui Chen1 年前

Great work! Human video is a useful and unlimited source for mainpulation.

相关视频