Loading video...
Video Failed to Load
Diffusion Transformers aren't just generative models, but also powerful multi-modal encoders. ConceptAttention creates rich heatmaps of text concepts in images from DiT representations. This even works on real images, and can be applied to tasks like segmentation! Demo 👇
24,380 views • 1 year ago •via X (Twitter)
0 Comments
No comments available
Comments from the original post will appear here
