正在加载视频...

视频加载失败

Transformer Explainer Interactive Learning of Text-Generative Models discuss: Transformers have revolutionized machine learning, yet their inner workings remain opaque to many. We present Transformer Explainer, an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model. Our tool helps users understand complex Transformer concepts by...

90,723 次观看 • 1 年前 •via X (Twitter)

5 条评论

Duen Horng "Polo" Chau 的头像
Duen Horng "Polo" Chau1 年前

Thanks @_akhaliq for sharing Transformer Explainer! Hope everyone enjoys playing with the tool!

Duen Horng "Polo" Chau 的头像
Duen Horng "Polo" Chau1 年前

👏Congrats to Transformer Explainer co-leads @cho_aeree @gracekimcy @alexkarpekov ; and @Jay4w @SeongminLeee @alec_helbling @Ben_Hoov at @gtcomputing !

Michael Buloichyk 的头像
Michael Buloichyk1 年前

That’s can be a huge addition to existing @karpathy tutorial on building GPT2 from scratch I presume

Joseph David Huelbig 的头像
Joseph David Huelbig1 年前

I was hoping for the other kind of transformer.. @transformers

OSDev 的头像
OSDev1 年前

rst @readwise save thread

相关视频

Announcing How Transformer LLMs Work, created with Jay Alammar and Maarten Grootendorst, co-authors of the beautifully illustrated book, “Hands-On Large Language Models.” This course offers a deep dive into the inner workings of the transformer architecture that powers large language models (LLMs). The transformer architecture revolutionized generative AI; in fact, the "GPT" in ChatGPT stands for "Generative Pre-Trained Transformer." Originally introduced in the Google Brain team's groundbreaking 2017 paper "Attention Is All You Need," by Vaswani and others, transformers were a highly scalable model for machine translation tasks. Variants of this architecture now power today’s LLMs such as those from OpenAI, Google, Meta, Cohere, Anthropic and DeepSeek. In this course, you’ll learn in detail how LLMs process text. You'll also work through code examples that illustrate that transformer's individual components. In details, you’ll learn: - How the representation of language has evolved, from Bag-of-Words to Word2Vec embeddings to the transformer architecture that captures a word's meanings taking into account the context of other words in the input. - How inputs are broken down into tokens before they are sent to the language model. - The details of a transformer's main stages: Tokenization and embedding, the stack of transformer blocks, and the language model head. - The inner workings of the transformer block, including attention, which calculates relevance scores, and the feedforward layer, which incorporates stored information learned in training. - How cached calculations make transformers faster. - Some of the most recent ideas in the latest models such as Mixture-of-Experts (MoE) which uses multiple sub-models and a router on each layer to improve the quality of LLMs. By the end of this course, you’ll have a deep understanding of how LLMs actually process text and be able to read through papers describing the latest models and understand the details. Gaining this intuition will improve your approach to building LLM applications. Please sign up here:

Andrew Ng

252,150 次观看 • 1 年前

New short course: Build Long-Context AI Apps with Jamba. Learn about state space models (SSMs), which have emerged as an alternative to transformers! Specifically, Jamba is a hybrid transformer-Mamba architecture that combines strengths of the transformer with ideas from SSMs. This course is built with AI21 Labs and taught by Chen Wang and Chen Almagor. The transformer architecture is computationally expensive when handling very long input contexts. But there's an alternative called Mamba, a selective state space model that can process very long contexts with a much lower computational cost. However, researchers found that the pure Mamba architecture underperforms in understanding the context, and gives lower-quality responses. To overcome this, AI21 developed the Jamba model, which combines Mamba's computational efficiency with the transformer's attention mechanism to help with the output quality. In this course, you’ll learn about how state space models, and Jamba, work. You’ll also learn how to prompt Jamba, use it to process long documents, and build long-context RAG apps. - Learn how Jamba combines transformer and state space model architectures to achieve high performance and quality - Use the AI21 SDK, with an example of prompting over a large 200k-token annual financial report of Nvidia - Use Jamba for tool-calling, with hands-on examples from calling simple arithmetic calculations to a function that returns quarterly company financial reports. - Learn how training for long context is done, and the metrics used for its evaluation - Create a RAG app using the AI21 Conversational RAG tool and build your own RAG pipeline that uses Jamba and LangChain. By the end of this course, you'll learn how to build applications that can handle context as long as an entire book. Please sign up here:

Andrew Ng

75,692 次观看 • 1 年前

New short course: Attention in Transformers: Concepts and Code in PyTorch. Last week we released a course on how LLM transformers work. This week, go deeper and learn about the technical ideas behind the attention mechanism, and see how to code it in PyTorch. This course is built with Joshua Starmer, Founder and CEO of StatQuest. The attention mechanism was a breakthrough that led to transformers, the architecture powering large language models like ChatGPT. Transformers, introduced in the 2017 paper: "Attention is All You Need" by Viswani and others, took off because of its highly scalable design. In this course, you’ll learn how the attention mechanism, a key element of transformer-based LLMs, works and implement it in PyTorch. You'll develop deep intuition about building reliable, functional, and scalable AI applications. What you will do: - Understand the evolution of the attention mechanism, a key breakthrough that led to transformers. - Learn the relationships between word embeddings, positional embeddings, and attention. - Learn about the Query, Key, and Value matrices, and how to produce and use them in attention. - Walk through the math required to calculate self-attention and masked self-attention to learn why and how they work. - Understand the difference between self-attention and masked self-attention and how one is used in the encoder to build context-aware embeddings and the other is used in the decoder for generative outputs. - Learn the details of the encoder-decoder architecture, cross-attention, and multi-head attention and how they are all incorporated into a transformer. - Use PyTorch to code a class that implements self-attention, masked self-attention, and multi-head attention. There're lots of exciting technical details in this course. Please sign up here:

Andrew Ng

132,135 次观看 • 1 年前