Video yükleniyor...
Video Yüklenemedi
Can we synthesize 3D human-scene interactions without learning from any 3D data? Yes! Check out Lei Li's GenZI, a novel zero-shot approach to generating 3D interactions by distilling priors from large vision-language models.
106,850 görüntüleme • 2 yıl önce •via X (Twitter)
10 Yorum

Michael Black2 yıl önce
@craigleili Very creative! Love it.

Dan Casas2 yıl önce
@craigleili Great idea and super well presented. Love it!

ScottieFox2 yıl önce
@craigleili There must exist a vector for the opposite as well. Since the paper clearly shows an inpainting mask of human 2D interactions, then one could assume a "place this actor in a scene" - via the same text encoding.

Hongwei Yi2 yıl önce
@craigleili The idea and the results are super nice!!! Can't wait to use.

Thiemo Alldieck2 yıl önce
@craigleili creative idea!

Chenfanfu Jiang2 yıl önce
@craigleili Inspiring

Dávid Komorowicz2 yıl önce
@craigleili Oh no, don't sit on the Guzheng😰

Chris Han2 yıl önce
@craigleili @memdotai mem it

Leo2 yıl önce
@craigleili so cool

Naureen Mahmood2 yıl önce
@craigleili I really like the method presented here, not to mention the lovely video! Very nice work.
