Loading video...

Video Failed to Load

Go Home

Built a token-wise likelihood visualizer for GPT-2 over the weekend. There are some interesting patterns and behaviors you can easily pick up from a visualization like this, like induction heads and which kinds of words/grammar LMs like to guess.

223,864 views • 3 years ago •via X (Twitter)

10 Comments

Linus's profile picture
Linus3 years ago

A particularly interesting example to run through this is source code, where because of the regular structure the LM does much better (much lower perplexity). Indentations and punctuation are particularly easy wins for GPT-2.

Linus's profile picture
Linus3 years ago

You can also use this viz to probe GPT-2 for what it thinks about different topics, which is kind of fun. You can imagine extensions of this "fill in the blank" UX become useful for writing workflows.

Shawn Simister's profile picture
Shawn Simister3 years ago

I built something like that as well but for code. Learned a lot about the model. Did you know a GPT-3 token can represent half of a character?

Alexander Cai's profile picture
Alexander Cai3 years ago

Have you heard of circuitsvis? It's a great open source library that also does this. Would also strongly recommend TransformerLens and

Linus's profile picture
Linus3 years ago

Yes, TransformerLens and Neel are fantastic <3

Simon Willison's profile picture
Simon Willison2 years ago

This is so useful! Any thoughts on what it would it take to turn something like this into an interactive web page people could try out for themselves? I wonder if one of the LLMs-compiled-to-WebAssembly could handle this

Linus's profile picture
Linus2 years ago

You could definitely do this with transformers.js and a small model like gpt2-small since the model needn't be large to have the padagogical effect. I currently just have a demo that runs GPT2-xl on the server. One of the many things I haven't yet had time to make public 🫠

Chris J. Wallace's profile picture
Chris J. Wallace3 years ago

I’d love your thoughts on mapping a color space to probability. I once prototyped something similar and found the huge variance in likelihoods for different words made that a bit tricksy, but this looks really good.

Linus's profile picture
Linus3 years ago

My coloring algorithm is roughly: min, max = mean(log_probs) ± 2.5 * stddev(log_probs) hue = token_logprob.clamp(min, max).scale(0, 150) color = f"hsl({hue}deg 60% 85%)" Key is to scale probs to hue in the log space, and then clamp at µ±2.5stddev.

Glavin Wiechert👨‍💻's profile picture
Glavin Wiechert👨‍💻3 years ago

Open-source? I was thinking of building a similar UI, as I’m sure many have. Would love to contribute. Awesome work!

Related Videos