Loading video...

Video Failed to Load

Go Home

Introducing ConceptAttention, an approach to interpreting diffusion transformer models! Write a prompt, choose some concepts, generate an image, and get high-quality heatmaps of text concepts. Our method outperforms existing methods like cross attention. Link to demo 👇

36,631 views • 1 year ago •via X (Twitter)

11 Comments

Alec Helbling's profile picture
Alec Helbling1 year ago

We have a live interactive demo hosted on Huggingface Spaces:

Alec Helbling's profile picture
Alec Helbling1 year ago

Check out the code here:

Alec Helbling's profile picture
Alec Helbling1 year ago

We repurpose the parameters of multi-modal DiT models (i.e. Flux) without training to create rich contextualized embeddings of text concepts. This allows us to create high quality saliency maps. We wrote a paper about our method:

Rainmaker's profile picture
Rainmaker2 years ago

Join me as I put several Machine Learning models head-to-head to see which one can beat the market and deliver strong returns. In this free Substack post I share several models that deliver better returns with much lower drawdown compared to Buy-and-Hold approach.

Rishi's profile picture
Rishi1 year ago

Very Nice Idea, Explainable AI as a field is not that much explored so nice to see good work in that domain

Rishi's profile picture
Rishi1 year ago

Will this work for not non Diffusion based models ?

Minh Nhat Nguyen's profile picture
Minh Nhat Nguyen1 year ago

i am ... going to see how well this works for video

 007's profile picture
 0071 year ago

Cool

Julien Blanchon 🇺🇦's profile picture
Julien Blanchon 🇺🇦1 year ago

Curious about what your intuition about the entanglement between dog and cat features ?

Alec Helbling's profile picture
Alec Helbling1 year ago

Absolutely. Our observations have been that this approach works very well for discerning distinct features (like dog and background) but struggles with examples like you show where you have two very similar concepts. There is clearly some more complex mechanism that allows the model to differentiate between these concepts that unfortunately our approach alone is not able to discern. It is worth noting this limitation is also at play in cross attention mechanisms, and poor object attribute assignment is a known limitation of current diffusion models.

Latent Spacer's profile picture
Latent Spacer1 year ago

I learned a lot from the paper, great work 👏

Related Videos