正在加载视频...
视频加载失败
Can we synthesize 3D human-scene interactions without learning from any 3D data? Yes! Check out Lei Li's GenZI, a novel zero-shot approach to generating 3D interactions by distilling priors from large vision-language models.
106,850 次观看 • 2 年前 •via X (Twitter)
10 条评论

Michael Black2 年前
@craigleili Very creative! Love it.

Dan Casas2 年前
@craigleili Great idea and super well presented. Love it!

ScottieFox2 年前
@craigleili There must exist a vector for the opposite as well. Since the paper clearly shows an inpainting mask of human 2D interactions, then one could assume a "place this actor in a scene" - via the same text encoding.

Hongwei Yi2 年前
@craigleili The idea and the results are super nice!!! Can't wait to use.

Thiemo Alldieck2 年前
@craigleili creative idea!

Chenfanfu Jiang2 年前
@craigleili Inspiring

Dávid Komorowicz2 年前
@craigleili Oh no, don't sit on the Guzheng😰

Chris Han2 年前
@craigleili @memdotai mem it

Leo2 年前
@craigleili so cool

Naureen Mahmood2 年前
@craigleili I really like the method presented here, not to mention the lovely video! Very nice work.
