Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Built a token-wise likelihood visualizer for GPT-2 over the weekend. There are some interesting patterns and behaviors you can easily pick up from a visualization like this, like induction heads and which kinds of words/grammar LMs like to guess.

Linus

40,589 subscribers

223,864 views • 3 years ago •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 Comments

Linus3 years ago

A particularly interesting example to run through this is source code, where because of the regular structure the LM does much better (much lower perplexity). Indentations and punctuation are particularly easy wins for GPT-2.

Linus3 years ago

You can also use this viz to probe GPT-2 for what it thinks about different topics, which is kind of fun. You can imagine extensions of this "fill in the blank" UX become useful for writing workflows.

Shawn Simister3 years ago

I built something like that as well but for code. Learned a lot about the model. Did you know a GPT-3 token can represent half of a character?

Alexander Cai3 years ago

Have you heard of circuitsvis? It's a great open source library that also does this. Would also strongly recommend TransformerLens and

Linus3 years ago

Yes, TransformerLens and Neel are fantastic <3

Simon Willison2 years ago

This is so useful! Any thoughts on what it would it take to turn something like this into an interactive web page people could try out for themselves? I wonder if one of the LLMs-compiled-to-WebAssembly could handle this

Linus2 years ago

You could definitely do this with transformers.js and a small model like gpt2-small since the model needn't be large to have the padagogical effect. I currently just have a demo that runs GPT2-xl on the server. One of the many things I haven't yet had time to make public 🫠

Chris J. Wallace3 years ago

I’d love your thoughts on mapping a color space to probability. I once prototyped something similar and found the huge variance in likelihoods for different words made that a bit tricksy, but this looks really good.

Linus3 years ago

My coloring algorithm is roughly: min, max = mean(log_probs) ± 2.5 * stddev(log_probs) hue = token_logprob.clamp(min, max).scale(0, 150) color = f"hsl({hue}deg 60% 85%)" Key is to scale probs to hue in the log space, and then clamp at µ±2.5stddev.

Glavin Wiechert👨‍💻3 years ago

Open-source? I was thinking of building a similar UI, as I’m sure many have. Would love to contribute. Awesome work!

Related Videos

Qullamaggie’s 3 Most Important Patterns in the Market “Patterns are very important. If you study thousands of leading stocks over the past 100 years, you will see is the same patterns occur over and over again. You get these type of patterns, like triangles, flat tops, higher lows. You get channels. Let's just look at TSLA. This has been a good one. This one, you see this? Flat top, higher lows, right? Which one is it? It's this one, okay? Here, another one. You have lower highs, higher lows. Which one is it? And then we have something like PINS, it looks like a channel, right? NIO, which one is this? What does this look like? CRWD I bought two days ago. What does this look like? Okay. MDB, what does this look like? FSLY, ESDC, what does this look like? It's the same patterns over and over again. Some of them are not super obvious.”

Qullamaggie’s 3 Most Important Patterns in the Market “Patterns are very important. If you study thousands of leading stocks over the past 100 years, you will see is the same patterns occur over and over again. You get these type of patterns, like triangles, flat tops, higher lows. You get channels. Let's just look at TSLA. This has been a good one. This one, you see this? Flat top, higher lows, right? Which one is it? It's this one, okay? Here, another one. You have lower highs, higher lows. Which one is it? And then we have something like PINS, it looks like a channel, right? NIO, which one is this? What does this look like? CRWD I bought two days ago. What does this look like? Okay. MDB, what does this look like? FSLY, ESDC, what does this look like? It's the same patterns over and over again. Some of them are not super obvious.”

Lone

19,679 views • 10 months ago

History's like a playbook - spot the patterns, and you can guess the next move before it happens.

History's like a playbook - spot the patterns, and you can guess the next move before it happens.

Darrin McBreen

12,419 views • 5 months ago

LLM Visualization This is actually pretty amazing! It helps to visualize the core components of LLMs like nano-gpt and GPT-3.

LLM Visualization This is actually pretty amazing! It helps to visualize the core components of LLMs like nano-gpt and GPT-3.

elvis

39,431 views • 1 year ago

Introducing Orgo, Computers for AI Agents. AI apps today are constrained to a chat interface and text responses. With Orgo, you can give a model like GPT access to a full computer. Orgo provides full desktop environments which can be controlled by AI models like GPT and Claude. Available now for developers.

Introducing Orgo, Computers for AI Agents. AI apps today are constrained to a chat interface and text responses. With Orgo, you can give a model like GPT access to a full computer. Orgo provides full desktop environments which can be controlled by AI models like GPT and Claude. Available now for developers.

Spencer Kinney

279,466 views • 10 months ago

The age of one-time token due diligence is over. A given "token" can easily involve 300+ changing contracts. We've built a balance sheet graph for every token, so you can see all protocol/token dependencies and then model economic and operational risks.

The age of one-time token due diligence is over. A given "token" can easily involve 300+ changing contracts. We've built a balance sheet graph for every token, so you can see all protocol/token dependencies and then model economic and operational risks.

ilemi

35,639 views • 23 days ago

Smoke detectors in bathrooms and public schools are capable of monitoring speech patterns “These are sold for public safety, especially in education” The smoke detectors have built in microphones and censors that can tell how many people are in the room, analyzes the speech patterns to determine aggressive behavior They can by programmed to pick up key words Which means, they could be programmed to pick up someone talking about something specific…

Smoke detectors in bathrooms and public schools are capable of monitoring speech patterns “These are sold for public safety, especially in education” The smoke detectors have built in microphones and censors that can tell how many people are in the room, analyzes the speech patterns to determine aggressive behavior They can by programmed to pick up key words Which means, they could be programmed to pick up someone talking about something specific…

Wall Street Apes

61,503 views • 1 month ago

Trying to make a visualization of something like morphdom comparing and updating two HTML trees. How is this? Anything I can do better? Even just design wise?

Trying to make a visualization of something like morphdom comparing and updating two HTML trees. How is this? Anything I can do better? Even just design wise?

Caleb Porzio ⚡️

31,720 views • 2 years ago

I keep hearing from our customers that Zapier has shipped some of the most interesting agentic behaviors out there. Our customers want to have an agent navigate to a website, scrape the content, and send the data to an app like Google Docs.

I keep hearing from our customers that Zapier has shipped some of the most interesting agentic behaviors out there. Our customers want to have an agent navigate to a website, scrape the content, and send the data to an app like Google Docs.

Andy Berman

154,172 views • 2 years ago

Generalist CEO Pete Florence says robotics models are in a transition period similar to the step change between GPT-2 and GPT-3. They're "starting to cross over into levels of performance where these things are commercially viable for a number of different applications." "We think this is a crossover point where we have a general model starting to be able to hit levels of reliability, speed, and improvisational intelligence where we can start to get these things out there." "Very much like — you take a GPT-2-level model, you scale it to a GPT-3-level model, and certain types of commercial applications start to become viable."

Generalist CEO Pete Florence says robotics models are in a transition period similar to the step change between GPT-2 and GPT-3. They're "starting to cross over into levels of performance where these things are commercially viable for a number of different applications." "We think this is a crossover point where we have a general model starting to be able to hit levels of reliability, speed, and improvisational intelligence where we can start to get these things out there." "Very much like — you take a GPT-2-level model, you scale it to a GPT-3-level model, and certain types of commercial applications start to become viable."

TBPN

21,338 views • 4 days ago

JUNE: Tbh, we haven't rehearsed together for even a single song yet. I mean, like, the vibe is like... a group rehearsal where, for example, song 1 has a total of 5 people, song 2 has 4 people. There has never been a rehearsal where we were all together like, 5 people for this song, 4 people for that song. It never happened. It's always been like, 2 people show up and 3 are missing, or 3 people show up and 2 are missing, like that all the time. JUNE: So honestly, there's no way anyone can guess what units there will be. This is for real. Because of that, I'm not really worried that I'll accidentally spoil anything. Because it's... there's absolutely no way to guess. And from what I've seen on my timeline, not a single person has guessed right yet when it comes to the units. Actually, forget about guessing the units, they can't even guess the songs right.

JUNE: Tbh, we haven't rehearsed together for even a single song yet. I mean, like, the vibe is like... a group rehearsal where, for example, song 1 has a total of 5 people, song 2 has 4 people. There has never been a rehearsal where we were all together like, 5 people for this song, 4 people for that song. It never happened. It's always been like, 2 people show up and 3 are missing, or 3 people show up and 2 are missing, like that all the time. JUNE: So honestly, there's no way anyone can guess what units there will be. This is for real. Because of that, I'm not really worried that I'll accidentally spoil anything. Because it's... there's absolutely no way to guess. And from what I've seen on my timeline, not a single person has guessed right yet when it comes to the units. Actually, forget about guessing the units, they can't even guess the songs right.

Lovely

82,200 views • 9 days ago

There is such a thing as "too-White." "There are some white behaviors that do have to stop, like cycling."

There is such a thing as "too-White." "There are some white behaviors that do have to stop, like cycling."

Fuentes Updates

423,895 views • 5 months ago

I wanted to see crime in SF over time and found there was no good visualization of it, so I spent an hour and built this Let me know if you'd like it as a full website 👀

I wanted to see crime in SF over time and found there was no good visualization of it, so I spent an hour and built this Let me know if you'd like it as a full website 👀

Rhys

542,248 views • 1 year ago

Feels like a very likeable and interesting new crop of All Blacks: Kirifi, who could've easily walked off and taken an overseas deal but stayed to chase the dream. Holland, the giant guy from Holland. Big Jim, the People's Pick.

Feels like a very likeable and interesting new crop of All Blacks: Kirifi, who could've easily walked off and taken an overseas deal but stayed to chase the dream. Holland, the giant guy from Holland. Big Jim, the People's Pick.

Jamie Wall

20,212 views • 11 months ago

#CHANYEOL : "Even though in this venue, the audience is quite far.. I can see all of you! I can really see everything from here! I can see the people on the far right who are shaking [their lightsticks] like this right now. And over there, I can see the person wearing a Santa outfit! I really like to sincerely thank all of you for filling up the venue again today!" 🥺🥺🥺

#CHANYEOL : "Even though in this venue, the audience is quite far.. I can see all of you! I can really see everything from here! I can see the people on the far right who are shaking [their lightsticks] like this right now. And over there, I can see the person wearing a Santa outfit! I really like to sincerely thank all of you for filling up the venue again today!" 🥺🥺🥺

NLNL 🍒👑

12,677 views • 1 year ago

Swablu has light and fluffy wings that are like cottony clouds. This Pokémon is not frightened of people. It lands on the heads of people and sits there like a cotton-fluff hat...

Swablu has light and fluffy wings that are like cottony clouds. This Pokémon is not frightened of people. It lands on the heads of people and sits there like a cotton-fluff hat...

Elena

43,899 views • 4 months ago

Qullamaggie on Minervini, Zanger, and VCPs “Yeah, VCP. Like Minervini, all the successful traders use the same, like Minervini is also like Dan Zanger and StockBee. Like I trade, it’s the same principle because they work. You find a strong stock that makes a big move, and then it goes sideways for a while and gets really tight, okay? And then it has the next leg higher, okay? It’s the same setup; this is a timeless setup. It’s not gonna go away. That’s how stocks are gonna move for the next hundred years. This is how stocks have moved in the past hundred years. You can go back and look at stock charts from the 20s, 30s, 40s, 50s, 60s, 70s, 80s, and 90s. It’s the same exact patterns over and over again. Absolutely no rocket science involved. All you have to do is just use some common sense. Just look at thousands, thousands of different stocks, preferably from like different decades. Okay, you need to build a short pattern database in your head and preferably in some kind of like a real database too, like in Evernote or just, you know, like a folder of screenshots or whatever that you can scroll through now and then. You know, memorize those patterns; it’s all about pattern recognition.”

Qullamaggie on Minervini, Zanger, and VCPs “Yeah, VCP. Like Minervini, all the successful traders use the same, like Minervini is also like Dan Zanger and StockBee. Like I trade, it’s the same principle because they work. You find a strong stock that makes a big move, and then it goes sideways for a while and gets really tight, okay? And then it has the next leg higher, okay? It’s the same setup; this is a timeless setup. It’s not gonna go away. That’s how stocks are gonna move for the next hundred years. This is how stocks have moved in the past hundred years. You can go back and look at stock charts from the 20s, 30s, 40s, 50s, 60s, 70s, 80s, and 90s. It’s the same exact patterns over and over again. Absolutely no rocket science involved. All you have to do is just use some common sense. Just look at thousands, thousands of different stocks, preferably from like different decades. Okay, you need to build a short pattern database in your head and preferably in some kind of like a real database too, like in Evernote or just, you know, like a folder of screenshots or whatever that you can scroll through now and then. You know, memorize those patterns; it’s all about pattern recognition.”

Lone

35,440 views • 5 months ago

For the price of a meal, one can see performances like this in Chengdu. There are numerous tea houses in which to see splendid face dancing shows and more. #Chengdu #China

For the price of a meal, one can see performances like this in Chengdu. There are numerous tea houses in which to see splendid face dancing shows and more. #Chengdu #China

Jason Smith - 上官杰文

11,701 views • 1 year ago

American Grocery Stores are spying on you Take Kroger for example, “They have like alternative profit divisions that are solely focused on monetizing our personal information” 🚨 New Microsoft powered facial recognition technology is being implemented “Monetizing all the customer data they've actually taken in. And wildly enough, they've made more than a billion from this. A grocery store made a billion dollars from monetizing your information. Like this is a store that profits like three and a half billion a year. And one third of that comes from monetizing our information” “Another thing I learned was like, you know those little digital price things. Like instead of having the paper one, they got a digital one. Did you know that's there? So they can like change the prices in real time based upon the traffic pat patterns in the store. Like they will change it in real time if more people are in the store. — Even wilder, Kroger has this thing called an edge shelf which is developed with Microsoft, which will actually facially recognize you, determine your like age and gender and things like that, and then adjust prices based upon what the AI thinks you'd be willing to pay for. Something like that's f*cking nuts. Like I expected to see this type of technology in like CIA facilities or embassies, but in a grocery store.”

American Grocery Stores are spying on you Take Kroger for example, “They have like alternative profit divisions that are solely focused on monetizing our personal information” 🚨 New Microsoft powered facial recognition technology is being implemented “Monetizing all the customer data they've actually taken in. And wildly enough, they've made more than a billion from this. A grocery store made a billion dollars from monetizing your information. Like this is a store that profits like three and a half billion a year. And one third of that comes from monetizing our information” “Another thing I learned was like, you know those little digital price things. Like instead of having the paper one, they got a digital one. Did you know that's there? So they can like change the prices in real time based upon the traffic pat patterns in the store. Like they will change it in real time if more people are in the store. — Even wilder, Kroger has this thing called an edge shelf which is developed with Microsoft, which will actually facially recognize you, determine your like age and gender and things like that, and then adjust prices based upon what the AI thinks you'd be willing to pay for. Something like that's f*cking nuts. Like I expected to see this type of technology in like CIA facilities or embassies, but in a grocery store.”

Wall Street Apes

1,438,382 views • 1 year ago

Walk like a lady! Sit like a lady! Eat like a lady! Dance like a lady! Talk like a lady! Smile like a lady! In fact, Laugh and Tease like a lady! 😃 Miss Nancy Adobea Anane, the Style Coach was at her alma mater, the St. Monica’s SHS at Asante Mampong to have a session with the girls which I believe is worth sharing. These are some reasons that make single-sex schools interesting. It's not only about academic activities, you are also trained to become such a lady which eventually leads you to become a Woman. Becoming a lady is about having good manners and treating others with respect, kindness, and consideration. Even if you are insulting koraa insult like a lady, not like one from the bush. Kuraseni 😂😂

Walk like a lady! Sit like a lady! Eat like a lady! Dance like a lady! Talk like a lady! Smile like a lady! In fact, Laugh and Tease like a lady! 😃 Miss Nancy Adobea Anane, the Style Coach was at her alma mater, the St. Monica’s SHS at Asante Mampong to have a session with the girls which I believe is worth sharing. These are some reasons that make single-sex schools interesting. It's not only about academic activities, you are also trained to become such a lady which eventually leads you to become a Woman. Becoming a lady is about having good manners and treating others with respect, kindness, and consideration. Even if you are insulting koraa insult like a lady, not like one from the bush. Kuraseni 😂😂

The Asante Nation

39,987 views • 1 year ago