Video wird geladen...
Video konnte nicht geladen werden
Can we synthesize 3D human-scene interactions without learning from any 3D data? Yes! Check out Lei Li's GenZI, a novel zero-shot approach to generating 3D interactions by distilling priors from large vision-language models.
106,850 Aufrufe • vor 2 Jahren •via X (Twitter)
10 Kommentare

Michael Blackvor 2 Jahren
@craigleili Very creative! Love it.

Dan Casasvor 2 Jahren
@craigleili Great idea and super well presented. Love it!

ScottieFoxvor 2 Jahren
@craigleili There must exist a vector for the opposite as well. Since the paper clearly shows an inpainting mask of human 2D interactions, then one could assume a "place this actor in a scene" - via the same text encoding.

Hongwei Yivor 2 Jahren
@craigleili The idea and the results are super nice!!! Can't wait to use.

Thiemo Alldieckvor 2 Jahren
@craigleili creative idea!

Chenfanfu Jiangvor 2 Jahren
@craigleili Inspiring

Dávid Komorowiczvor 2 Jahren
@craigleili Oh no, don't sit on the Guzheng😰

Chris Hanvor 2 Jahren
@craigleili @memdotai mem it

Leovor 2 Jahren
@craigleili so cool

Naureen Mahmoodvor 2 Jahren
@craigleili I really like the method presented here, not to mention the lovely video! Very nice work.
