Загрузка видео...
Не удалось загрузить видео
Can we synthesize 3D human-scene interactions without learning from any 3D data? Yes! Check out Lei Li's GenZI, a novel zero-shot approach to generating 3D interactions by distilling priors from large vision-language models.
106,850 просмотров • 2 лет назад •via X (Twitter)
Комментарии: 10

Michael Black2 лет назад
@craigleili Very creative! Love it.

Dan Casas2 лет назад
@craigleili Great idea and super well presented. Love it!

ScottieFox2 лет назад
@craigleili There must exist a vector for the opposite as well. Since the paper clearly shows an inpainting mask of human 2D interactions, then one could assume a "place this actor in a scene" - via the same text encoding.

Hongwei Yi2 лет назад
@craigleili The idea and the results are super nice!!! Can't wait to use.

Thiemo Alldieck2 лет назад
@craigleili creative idea!

Chenfanfu Jiang2 лет назад
@craigleili Inspiring

Dávid Komorowicz2 лет назад
@craigleili Oh no, don't sit on the Guzheng😰

Chris Han2 лет назад
@craigleili @memdotai mem it

Leo2 лет назад
@craigleili so cool

Naureen Mahmood2 лет назад
@craigleili I really like the method presented here, not to mention the lovely video! Very nice work.
