Video yükleniyor...
Video Yüklenemedi
Diffusion Transformers aren't just generative models, but also powerful multi-modal encoders. ConceptAttention creates rich heatmaps of text concepts in images from DiT representations. This even works on real images, and can be applied to tasks like segmentation! Demo 👇
24,380 görüntüleme • 1 yıl önce •via X (Twitter)
0 Yorum
Yorum bulunmuyor
Orijinal gönderinin yorumları burada görünecek
