Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Transformer Explainer Really cool interactive tool to learn about the inner workings of a Transformer model. Apparently, it runs a GPT-2 instance locally in the user's browser and allows you to experiment with your own inputs. This is a nice tool to learn more about the different components inside... show more

elvis

309,314 subscribers

121,921 views • 1 year ago •via X (Twitter)

Education Science & Technology

Anya Rossi• Live Now

Private livecam show

10 Comments

elvis1 year ago

Here is a short video going over the tool:

Eli Luong, MD, DABA1 year ago

beautiful tool 🥰

Manny1 year ago

This is great thanks for sharing.

GPT.Biz1 year ago

This looks like an amazing tool to deepen your understanding of AI models in a hands-on way!

Duen Horng "Polo" Chau1 year ago

Thanks for sharing our work! Congrats @cho_aeree @gracekimcy @alexkarpekov @alec_helbling @SeongminLeee @Jay4w @Ben_Hoov , all students at @gtcomputing !

Tim Hulse1 year ago

Thank you for not using “The cat sat on the ___”

Subba Reddy1 year ago

Transformer explainer : should be mandatory read for all ML undergrad courses. Bar raiser in interactive inner workings of a tool. we need similar interactive visual tools for "code repos" to grok THE FLOW from UI -> Auth ->server ->various layers ->DB

Nathanael1 year ago

the easiest way to understand transformers. Great work!

micke1 year ago

thanks elvis

RUH-ROH1 year ago

Thanks for the share!

Related Videos

Transformer Explainer Interactive Learning of Text-Generative Models discuss: Transformers have revolutionized machine learning, yet their inner workings remain opaque to many. We present Transformer Explainer, an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model. Our tool helps users understand complex Transformer concepts by integrating a model overview and enabling smooth transitions across abstraction levels of mathematical operations and model structures. It runs a live GPT-2 instance locally in the user's browser, empowering users to experiment with their own input and observe in real-time how the internal components and parameters of the Transformer work together to predict the next tokens. Our tool requires no installation or special hardware, broadening the public's education access to modern generative AI techniques.

Transformer Explainer Interactive Learning of Text-Generative Models discuss: Transformers have revolutionized machine learning, yet their inner workings remain opaque to many. We present Transformer Explainer, an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model. Our tool helps users understand complex Transformer concepts by integrating a model overview and enabling smooth transitions across abstraction levels of mathematical operations and model structures. It runs a live GPT-2 instance locally in the user's browser, empowering users to experiment with their own input and observe in real-time how the internal components and parameters of the Transformer work together to predict the next tokens. Our tool requires no installation or special hardware, broadening the public's education access to modern generative AI techniques.

AK

90,798 views • 1 year ago

New short course: Build Long-Context AI Apps with Jamba. Learn about state space models (SSMs), which have emerged as an alternative to transformers! Specifically, Jamba is a hybrid transformer-Mamba architecture that combines strengths of the transformer with ideas from SSMs. This course is built with AI21 Labs and taught by Chen Wang and Chen Almagor. The transformer architecture is computationally expensive when handling very long input contexts. But there's an alternative called Mamba, a selective state space model that can process very long contexts with a much lower computational cost. However, researchers found that the pure Mamba architecture underperforms in understanding the context, and gives lower-quality responses. To overcome this, AI21 developed the Jamba model, which combines Mamba's computational efficiency with the transformer's attention mechanism to help with the output quality. In this course, you’ll learn about how state space models, and Jamba, work. You’ll also learn how to prompt Jamba, use it to process long documents, and build long-context RAG apps. - Learn how Jamba combines transformer and state space model architectures to achieve high performance and quality - Use the AI21 SDK, with an example of prompting over a large 200k-token annual financial report of Nvidia - Use Jamba for tool-calling, with hands-on examples from calling simple arithmetic calculations to a function that returns quarterly company financial reports. - Learn how training for long context is done, and the metrics used for its evaluation - Create a RAG app using the AI21 Conversational RAG tool and build your own RAG pipeline that uses Jamba and LangChain. By the end of this course, you'll learn how to build applications that can handle context as long as an entire book. Please sign up here:

New short course: Build Long-Context AI Apps with Jamba. Learn about state space models (SSMs), which have emerged as an alternative to transformers! Specifically, Jamba is a hybrid transformer-Mamba architecture that combines strengths of the transformer with ideas from SSMs. This course is built with AI21 Labs and taught by Chen Wang and Chen Almagor. The transformer architecture is computationally expensive when handling very long input contexts. But there's an alternative called Mamba, a selective state space model that can process very long contexts with a much lower computational cost. However, researchers found that the pure Mamba architecture underperforms in understanding the context, and gives lower-quality responses. To overcome this, AI21 developed the Jamba model, which combines Mamba's computational efficiency with the transformer's attention mechanism to help with the output quality. In this course, you’ll learn about how state space models, and Jamba, work. You’ll also learn how to prompt Jamba, use it to process long documents, and build long-context RAG apps. - Learn how Jamba combines transformer and state space model architectures to achieve high performance and quality - Use the AI21 SDK, with an example of prompting over a large 200k-token annual financial report of Nvidia - Use Jamba for tool-calling, with hands-on examples from calling simple arithmetic calculations to a function that returns quarterly company financial reports. - Learn how training for long context is done, and the metrics used for its evaluation - Create a RAG app using the AI21 Conversational RAG tool and build your own RAG pipeline that uses Jamba and LangChain. By the end of this course, you'll learn how to build applications that can handle context as long as an entire book. Please sign up here:

Andrew Ng

77,792 views • 1 year ago

Announcing How Transformer LLMs Work, created with Jay Alammar and Maarten Grootendorst, co-authors of the beautifully illustrated book, “Hands-On Large Language Models.” This course offers a deep dive into the inner workings of the transformer architecture that powers large language models (LLMs). The transformer architecture revolutionized generative AI; in fact, the "GPT" in ChatGPT stands for "Generative Pre-Trained Transformer." Originally introduced in the Google Brain team's groundbreaking 2017 paper "Attention Is All You Need," by Vaswani and others, transformers were a highly scalable model for machine translation tasks. Variants of this architecture now power today’s LLMs such as those from OpenAI, Google, Meta, Cohere, Anthropic and DeepSeek. In this course, you’ll learn in detail how LLMs process text. You'll also work through code examples that illustrate that transformer's individual components. In details, you’ll learn: - How the representation of language has evolved, from Bag-of-Words to Word2Vec embeddings to the transformer architecture that captures a word's meanings taking into account the context of other words in the input. - How inputs are broken down into tokens before they are sent to the language model. - The details of a transformer's main stages: Tokenization and embedding, the stack of transformer blocks, and the language model head. - The inner workings of the transformer block, including attention, which calculates relevance scores, and the feedforward layer, which incorporates stored information learned in training. - How cached calculations make transformers faster. - Some of the most recent ideas in the latest models such as Mixture-of-Experts (MoE) which uses multiple sub-models and a router on each layer to improve the quality of LLMs. By the end of this course, you’ll have a deep understanding of how LLMs actually process text and be able to read through papers describing the latest models and understand the details. Gaining this intuition will improve your approach to building LLM applications. Please sign up here:

Announcing How Transformer LLMs Work, created with Jay Alammar and Maarten Grootendorst, co-authors of the beautifully illustrated book, “Hands-On Large Language Models.” This course offers a deep dive into the inner workings of the transformer architecture that powers large language models (LLMs). The transformer architecture revolutionized generative AI; in fact, the "GPT" in ChatGPT stands for "Generative Pre-Trained Transformer." Originally introduced in the Google Brain team's groundbreaking 2017 paper "Attention Is All You Need," by Vaswani and others, transformers were a highly scalable model for machine translation tasks. Variants of this architecture now power today’s LLMs such as those from OpenAI, Google, Meta, Cohere, Anthropic and DeepSeek. In this course, you’ll learn in detail how LLMs process text. You'll also work through code examples that illustrate that transformer's individual components. In details, you’ll learn: - How the representation of language has evolved, from Bag-of-Words to Word2Vec embeddings to the transformer architecture that captures a word's meanings taking into account the context of other words in the input. - How inputs are broken down into tokens before they are sent to the language model. - The details of a transformer's main stages: Tokenization and embedding, the stack of transformer blocks, and the language model head. - The inner workings of the transformer block, including attention, which calculates relevance scores, and the feedforward layer, which incorporates stored information learned in training. - How cached calculations make transformers faster. - Some of the most recent ideas in the latest models such as Mixture-of-Experts (MoE) which uses multiple sub-models and a router on each layer to improve the quality of LLMs. By the end of this course, you’ll have a deep understanding of how LLMs actually process text and be able to read through papers describing the latest models and understand the details. Gaining this intuition will improve your approach to building LLM applications. Please sign up here:

Andrew Ng

259,421 views • 1 year ago

"Money is a tool for reducing our uncertainty about the future. Undermining that tool raises social anxiety. Getting it back allows us to... ponder the possibilities of what we can do with the future. And it is thinking about the future that helps build civilization." 🎯

"Money is a tool for reducing our uncertainty about the future. Undermining that tool raises social anxiety. Getting it back allows us to... ponder the possibilities of what we can do with the future. And it is thinking about the future that helps build civilization." 🎯

Guy Swann

10,890 views • 11 months ago

EasyRain is a tool that allows artists to create highly realistic rain using a Blueprint system and harnessing Niagara particles 🌧️ Developed by VFX Artist and Unreal Engine Content Creator William Faucher, the tool can be purchased on Fab. Learn more about EasyRain here:

EasyRain is a tool that allows artists to create highly realistic rain using a Blueprint system and harnessing Niagara particles 🌧️ Developed by VFX Artist and Unreal Engine Content Creator William Faucher, the tool can be purchased on Fab. Learn more about EasyRain here:

Unreal Engine

25,869 views • 1 year ago

OpenAI shipped a new speech-to-speech model today: gpt-realtime-2 This is the first speech-to-speech model good enough to use in my voice agents that do "real work." Or real play, for that matter. Here's gpt-realtime-2 as the brain of the ship AI in Gradient Bang. The voice-to-voice response and tool calling times here are unedited, so you can see exactly what the interaction with the model is like in an agent with a very complex system instruction and frequent tool calls. (I did clip out the subagent task execution segments, after gpt-realtime-2 starts a subagent via a tool call. Subagents in this config used gpt-5.2 "medium" effort.)

OpenAI shipped a new speech-to-speech model today: gpt-realtime-2 This is the first speech-to-speech model good enough to use in my voice agents that do "real work." Or real play, for that matter. Here's gpt-realtime-2 as the brain of the ship AI in Gradient Bang. The voice-to-voice response and tool calling times here are unedited, so you can see exactly what the interaction with the model is like in an agent with a very complex system instruction and frequent tool calls. (I did clip out the subagent task execution segments, after gpt-realtime-2 starts a subagent via a tool call. Subagents in this config used gpt-5.2 "medium" effort.)

kwindla

54,912 views • 2 months ago

The Nostromo Wreckage has a unique tool for players to leverage in Chase 👀💨 Learn more about the Map's Steam Pipes.

The Nostromo Wreckage has a unique tool for players to leverage in Chase 👀💨 Learn more about the Map's Steam Pipes.

Dead by Daylight

299,189 views • 2 years ago

John Allan Namu: AI is a powerful tool but if you use it badly then you will get a bad outcome despite the power of the tool. Now that we are deploying this model across the most critical sectors in our economy, if we fail to get a richer understanding of the tool and fail to interrogate the intent with which we are using this tool, then we are going to end up in the situation that are in now #citizenexplainer

John Allan Namu: AI is a powerful tool but if you use it badly then you will get a bad outcome despite the power of the tool. Now that we are deploying this model across the most critical sectors in our economy, if we fail to get a richer understanding of the tool and fail to interrogate the intent with which we are using this tool, then we are going to end up in the situation that are in now #citizenexplainer

Citizen TV Kenya

11,775 views • 2 months ago

🇬🇭 “After fixing the transformer, you didn’t close it.” — An individual recorded the aftermath of ECG officials fixing a transformer in his community but failing to close it, warning that rainwater could enter and cause damage.

🇬🇭 “After fixing the transformer, you didn’t close it.” — An individual recorded the aftermath of ECG officials fixing a transformer in his community but failing to close it, warning that rainwater could enter and cause damage.

THE STATE NEWS

39,665 views • 2 months ago

Given ISOLDE is a model-fitting tool, it was about time I gave it a way to, you know, actually validate the fit of a model to a map. As such, in consultation with Greg Pintille and Tom Goddard I've implemented the Q-Score algorithm ( in ChimeraX. (1/2)

Given ISOLDE is a model-fitting tool, it was about time I gave it a way to, you know, actually validate the fit of a model to a map. As such, in consultation with Greg Pintille and Tom Goddard I've implemented the Q-Score algorithm ( in ChimeraX. (1/2)

Tristan Croll

30,210 views • 3 years ago

Transformer by hand ✍️ in Excel ~ I just released my first-ever "Full-Stack" implementation of the Transformer model. 👇Download xlsx to give it a try!

Transformer by hand ✍️ in Excel ~ I just released my first-ever "Full-Stack" implementation of the Transformer model. 👇Download xlsx to give it a try!

Tom Yeh

2,997,651 views • 1 year ago

OmniParser, the new screen parsing tool from Microsoft (and #1 trending model on Hugging Face), can now run 100% locally in your browser with Transformers.js! 🤯 Who's going to be the first to turn this into a browser extension? 👀 Endless possibilities! Demo & code below! 👇

OmniParser, the new screen parsing tool from Microsoft (and #1 trending model on Hugging Face), can now run 100% locally in your browser with Transformers.js! 🤯 Who's going to be the first to turn this into a browser extension? 👀 Endless possibilities! Demo & code below! 👇

Xenova

64,560 views • 1 year ago

i built a tool that lets you clone ANY video with AI saw a movie scene you want to recreate? a commercial? a music video? just upload it to this tool and get the exact structured prompt to generate it in sora 2 you no longer need to guess how to describe what you want to generate. show the tool the reference and it engineers the prompt for you. the tool analyzes: - camera movements and angles - lighting and color grading - scene composition and timing then converts everything into sora-optimized prompt structure this is how you recreate any visual style without prompt engineering experience RT + reply 'TOOL' and i'll send you access (must follow so i can dm)

i built a tool that lets you clone ANY video with AI saw a movie scene you want to recreate? a commercial? a music video? just upload it to this tool and get the exact structured prompt to generate it in sora 2 you no longer need to guess how to describe what you want to generate. show the tool the reference and it engineers the prompt for you. the tool analyzes: - camera movements and angles - lighting and color grading - scene composition and timing then converts everything into sora-optimized prompt structure this is how you recreate any visual style without prompt engineering experience RT + reply 'TOOL' and i'll send you access (must follow so i can dm)

Miko

42,909 views • 9 months ago

This is the coolest tool you'll see today! Introducing Drawdata! 🚀 It allows you to draw a 2-D dataset of any shape in a jupyter notebook. A very handy tool for learning & understanding the behaviour of ML algorithms! Check this out👇

This is the coolest tool you'll see today! Introducing Drawdata! 🚀 It allows you to draw a 2-D dataset of any shape in a jupyter notebook. A very handy tool for learning & understanding the behaviour of ML algorithms! Check this out👇

Akshay 🚀

112,197 views • 2 years ago

Introducing Generative AI by Getty Images – a new tool that pairs our best-in-class creative content with the latest AI technology for a commercially safe generative AI tool! Trained on Getty Images’ world-class creative content, the tool works seamlessly with our expansive library of authentic and compelling creative visuals and Custom Content solutions, allowing customers to elevate their entire end-to-end creative process to find the right visual content for any need. To learn more about the tool and how to get access, along with Getty Images’ stance on responsible AI practices, visit:

Introducing Generative AI by Getty Images – a new tool that pairs our best-in-class creative content with the latest AI technology for a commercially safe generative AI tool! Trained on Getty Images’ world-class creative content, the tool works seamlessly with our expansive library of authentic and compelling creative visuals and Custom Content solutions, allowing customers to elevate their entire end-to-end creative process to find the right visual content for any need. To learn more about the tool and how to get access, along with Getty Images’ stance on responsible AI practices, visit:

Getty Images

672,983 views • 2 years ago

Now I learn the proper way to open up a bag of zip ties. I have zip ties that keep coming out of the package in my tool bag. This is a tip I will definitely use in the future. How about you?

Now I learn the proper way to open up a bag of zip ties. I have zip ties that keep coming out of the package in my tool bag. This is a tip I will definitely use in the future. How about you?

🌸𝓐𝓾𝓭𝓻𝓮𝔂🌸

132,432 views • 2 months ago

🌐 The integrated browser in Visual Studio Code now has pinch-to-zoom and zoom shortcuts on macOS, better agent tool descriptions, and more updates. Learn more →

🌐 The integrated browser in Visual Studio Code now has pinch-to-zoom and zoom shortcuts on macOS, better agent tool descriptions, and more updates. Learn more →

Visual Studio Code

56,203 views • 3 months ago

[Transformer] by Hand✍️📺 5-minute Video Tutorial Anna Rahn made this short video to explain the Transformer exercise for my Computer Vision course last spring. In 5 minutes, she demonstrates the key calculations of the Transformer by hand with pen and paper! Anna is a fantastic student. I am lucky to have her in my lab!

[Transformer] by Hand✍️📺 5-minute Video Tutorial Anna Rahn made this short video to explain the Transformer exercise for my Computer Vision course last spring. In 5 minutes, she demonstrates the key calculations of the Transformer by hand with pen and paper! Anna is a fantastic student. I am lucky to have her in my lab!

Tom Yeh

133,460 views • 2 years ago