Loading video...

Video Failed to Load

Go Home

Transformer Explainer Really cool interactive tool to learn about the inner workings of a Transformer model. Apparently, it runs a GPT-2 instance locally in the user's browser and allows you to experiment with your own inputs. This is a nice tool to learn more about the different components inside...

121,894 views • 1 year ago •via X (Twitter)

10 Comments

elvis's profile picture
elvis1 year ago

Here is a short video going over the tool:

Eli Luong, MD, DABA's profile picture
Eli Luong, MD, DABA1 year ago

beautiful tool 🥰

Manny's profile picture
Manny1 year ago

This is great thanks for sharing.

GPT.Biz's profile picture
GPT.Biz1 year ago

This looks like an amazing tool to deepen your understanding of AI models in a hands-on way!

Duen Horng "Polo" Chau's profile picture
Duen Horng "Polo" Chau1 year ago

Thanks for sharing our work! Congrats @cho_aeree @gracekimcy @alexkarpekov @alec_helbling @SeongminLeee @Jay4w @Ben_Hoov , all students at @gtcomputing !

Tim Hulse's profile picture
Tim Hulse1 year ago

Thank you for not using “The cat sat on the ___”

Subba Reddy's profile picture
Subba Reddy1 year ago

Transformer explainer : should be mandatory read for all ML undergrad courses. Bar raiser in interactive inner workings of a tool. we need similar interactive visual tools for "code repos" to grok THE FLOW from UI -> Auth ->server ->various layers ->DB

Nathanael's profile picture
Nathanael1 year ago

the easiest way to understand transformers. Great work!

micke's profile picture
micke1 year ago

thanks elvis

RUH-ROH's profile picture
RUH-ROH1 year ago

Thanks for the share!

Related Videos

New short course: Build Long-Context AI Apps with Jamba. Learn about state space models (SSMs), which have emerged as an alternative to transformers! Specifically, Jamba is a hybrid transformer-Mamba architecture that combines strengths of the transformer with ideas from SSMs. This course is built with AI21 Labs and taught by Chen Wang and Chen Almagor. The transformer architecture is computationally expensive when handling very long input contexts. But there's an alternative called Mamba, a selective state space model that can process very long contexts with a much lower computational cost. However, researchers found that the pure Mamba architecture underperforms in understanding the context, and gives lower-quality responses. To overcome this, AI21 developed the Jamba model, which combines Mamba's computational efficiency with the transformer's attention mechanism to help with the output quality. In this course, you’ll learn about how state space models, and Jamba, work. You’ll also learn how to prompt Jamba, use it to process long documents, and build long-context RAG apps. - Learn how Jamba combines transformer and state space model architectures to achieve high performance and quality - Use the AI21 SDK, with an example of prompting over a large 200k-token annual financial report of Nvidia - Use Jamba for tool-calling, with hands-on examples from calling simple arithmetic calculations to a function that returns quarterly company financial reports. - Learn how training for long context is done, and the metrics used for its evaluation - Create a RAG app using the AI21 Conversational RAG tool and build your own RAG pipeline that uses Jamba and LangChain. By the end of this course, you'll learn how to build applications that can handle context as long as an entire book. Please sign up here:

Andrew Ng

75,692 views • 1 year ago

Announcing How Transformer LLMs Work, created with Jay Alammar and Maarten Grootendorst, co-authors of the beautifully illustrated book, “Hands-On Large Language Models.” This course offers a deep dive into the inner workings of the transformer architecture that powers large language models (LLMs). The transformer architecture revolutionized generative AI; in fact, the "GPT" in ChatGPT stands for "Generative Pre-Trained Transformer." Originally introduced in the Google Brain team's groundbreaking 2017 paper "Attention Is All You Need," by Vaswani and others, transformers were a highly scalable model for machine translation tasks. Variants of this architecture now power today’s LLMs such as those from OpenAI, Google, Meta, Cohere, Anthropic and DeepSeek. In this course, you’ll learn in detail how LLMs process text. You'll also work through code examples that illustrate that transformer's individual components. In details, you’ll learn: - How the representation of language has evolved, from Bag-of-Words to Word2Vec embeddings to the transformer architecture that captures a word's meanings taking into account the context of other words in the input. - How inputs are broken down into tokens before they are sent to the language model. - The details of a transformer's main stages: Tokenization and embedding, the stack of transformer blocks, and the language model head. - The inner workings of the transformer block, including attention, which calculates relevance scores, and the feedforward layer, which incorporates stored information learned in training. - How cached calculations make transformers faster. - Some of the most recent ideas in the latest models such as Mixture-of-Experts (MoE) which uses multiple sub-models and a router on each layer to improve the quality of LLMs. By the end of this course, you’ll have a deep understanding of how LLMs actually process text and be able to read through papers describing the latest models and understand the details. Gaining this intuition will improve your approach to building LLM applications. Please sign up here:

Andrew Ng

252,150 views • 1 year ago