Loading video...
Video Failed to Load
Can we synthesize 3D human-scene interactions without learning from any 3D data? Yes! Check out Lei Li's GenZI, a novel zero-shot approach to generating 3D interactions by distilling priors from large vision-language models.
106,850 views • 2 years ago •via X (Twitter)
10 Comments

Michael Black2 years ago
@craigleili Very creative! Love it.

Dan Casas2 years ago
@craigleili Great idea and super well presented. Love it!

ScottieFox2 years ago
@craigleili There must exist a vector for the opposite as well. Since the paper clearly shows an inpainting mask of human 2D interactions, then one could assume a "place this actor in a scene" - via the same text encoding.

Hongwei Yi2 years ago
@craigleili The idea and the results are super nice!!! Can't wait to use.

Thiemo Alldieck2 years ago
@craigleili creative idea!

Chenfanfu Jiang2 years ago
@craigleili Inspiring

Dávid Komorowicz2 years ago
@craigleili Oh no, don't sit on the Guzheng😰

Chris Han2 years ago
@craigleili @memdotai mem it

Leo2 years ago
@craigleili so cool

Naureen Mahmood2 years ago
@craigleili I really like the method presented here, not to mention the lovely video! Very nice work.
