Загрузка видео...

Не удалось загрузить видео

На главную

Transformer Explainer Really cool interactive tool to learn about the inner workings of a Transformer model. Apparently, it runs a GPT-2 instance locally in the user's browser and allows you to experiment with your own inputs. This is a nice tool to learn more about the different components inside...

121,867 просмотров • 1 год назад •via X (Twitter)

Комментарии: 10

Фото профиля elvis
elvis1 год назад

Here is a short video going over the tool:

Фото профиля Eli Luong, MD, DABA
Eli Luong, MD, DABA1 год назад

beautiful tool 🥰

Фото профиля Manny
Manny1 год назад

This is great thanks for sharing.

Фото профиля GPT.Biz
GPT.Biz1 год назад

This looks like an amazing tool to deepen your understanding of AI models in a hands-on way!

Фото профиля Duen Horng "Polo" Chau
Duen Horng "Polo" Chau1 год назад

Thanks for sharing our work! Congrats @cho_aeree @gracekimcy @alexkarpekov @alec_helbling @SeongminLeee @Jay4w @Ben_Hoov , all students at @gtcomputing !

Фото профиля Tim Hulse
Tim Hulse1 год назад

Thank you for not using “The cat sat on the ___”

Фото профиля Subba Reddy
Subba Reddy1 год назад

Transformer explainer : should be mandatory read for all ML undergrad courses. Bar raiser in interactive inner workings of a tool. we need similar interactive visual tools for "code repos" to grok THE FLOW from UI -> Auth ->server ->various layers ->DB

Фото профиля Nathanael
Nathanael1 год назад

the easiest way to understand transformers. Great work!

Фото профиля micke
micke1 год назад

thanks elvis

Фото профиля RUH-ROH
RUH-ROH1 год назад

Thanks for the share!

Похожие видео

New short course: Build Long-Context AI Apps with Jamba. Learn about state space models (SSMs), which have emerged as an alternative to transformers! Specifically, Jamba is a hybrid transformer-Mamba architecture that combines strengths of the transformer with ideas from SSMs. This course is built with AI21 Labs and taught by Chen Wang and Chen Almagor. The transformer architecture is computationally expensive when handling very long input contexts. But there's an alternative called Mamba, a selective state space model that can process very long contexts with a much lower computational cost. However, researchers found that the pure Mamba architecture underperforms in understanding the context, and gives lower-quality responses. To overcome this, AI21 developed the Jamba model, which combines Mamba's computational efficiency with the transformer's attention mechanism to help with the output quality. In this course, you’ll learn about how state space models, and Jamba, work. You’ll also learn how to prompt Jamba, use it to process long documents, and build long-context RAG apps. - Learn how Jamba combines transformer and state space model architectures to achieve high performance and quality - Use the AI21 SDK, with an example of prompting over a large 200k-token annual financial report of Nvidia - Use Jamba for tool-calling, with hands-on examples from calling simple arithmetic calculations to a function that returns quarterly company financial reports. - Learn how training for long context is done, and the metrics used for its evaluation - Create a RAG app using the AI21 Conversational RAG tool and build your own RAG pipeline that uses Jamba and LangChain. By the end of this course, you'll learn how to build applications that can handle context as long as an entire book. Please sign up here:

Andrew Ng

75,692 просмотров • 1 год назад

Announcing How Transformer LLMs Work, created with Jay Alammar and Maarten Grootendorst, co-authors of the beautifully illustrated book, “Hands-On Large Language Models.” This course offers a deep dive into the inner workings of the transformer architecture that powers large language models (LLMs). The transformer architecture revolutionized generative AI; in fact, the "GPT" in ChatGPT stands for "Generative Pre-Trained Transformer." Originally introduced in the Google Brain team's groundbreaking 2017 paper "Attention Is All You Need," by Vaswani and others, transformers were a highly scalable model for machine translation tasks. Variants of this architecture now power today’s LLMs such as those from OpenAI, Google, Meta, Cohere, Anthropic and DeepSeek. In this course, you’ll learn in detail how LLMs process text. You'll also work through code examples that illustrate that transformer's individual components. In details, you’ll learn: - How the representation of language has evolved, from Bag-of-Words to Word2Vec embeddings to the transformer architecture that captures a word's meanings taking into account the context of other words in the input. - How inputs are broken down into tokens before they are sent to the language model. - The details of a transformer's main stages: Tokenization and embedding, the stack of transformer blocks, and the language model head. - The inner workings of the transformer block, including attention, which calculates relevance scores, and the feedforward layer, which incorporates stored information learned in training. - How cached calculations make transformers faster. - Some of the most recent ideas in the latest models such as Mixture-of-Experts (MoE) which uses multiple sub-models and a router on each layer to improve the quality of LLMs. By the end of this course, you’ll have a deep understanding of how LLMs actually process text and be able to read through papers describing the latest models and understand the details. Gaining this intuition will improve your approach to building LLM applications. Please sign up here:

Andrew Ng

252,150 просмотров • 1 год назад