Loading video...

Video Failed to Load

Go Home

This is how large language models turn objects to vector representations. In this video, we explore how large language models (LLMs) convert objects into internal representations, especially when translating between languages like English and Hindi. Using real-world examples, we highlight the challenges of gender inference, grammatical structure, and why...

26,776 views • 1 year ago •via X (Twitter)

2 Comments

Berojgar Engineer's profile picture
Berojgar Engineer1 year ago

The way you describe just wow,😍😍 wanted to buy Low Level Design course from interviewReady, will it be worth it or you are giving me the course for free of cost 😇🤣

AssemblyAI's profile picture
AssemblyAI1 year ago

Announcing: Our most advanced speech-to-text model goes beyond accuracy to capture the real-world complexity of human conversation and deliver reliable, source-of-truth audio data. Explore Universal-2 updates 👇

Related Videos

Announcing How Transformer LLMs Work, created with Jay Alammar and Maarten Grootendorst, co-authors of the beautifully illustrated book, “Hands-On Large Language Models.” This course offers a deep dive into the inner workings of the transformer architecture that powers large language models (LLMs). The transformer architecture revolutionized generative AI; in fact, the "GPT" in ChatGPT stands for "Generative Pre-Trained Transformer." Originally introduced in the Google Brain team's groundbreaking 2017 paper "Attention Is All You Need," by Vaswani and others, transformers were a highly scalable model for machine translation tasks. Variants of this architecture now power today’s LLMs such as those from OpenAI, Google, Meta, Cohere, Anthropic and DeepSeek. In this course, you’ll learn in detail how LLMs process text. You'll also work through code examples that illustrate that transformer's individual components. In details, you’ll learn: - How the representation of language has evolved, from Bag-of-Words to Word2Vec embeddings to the transformer architecture that captures a word's meanings taking into account the context of other words in the input. - How inputs are broken down into tokens before they are sent to the language model. - The details of a transformer's main stages: Tokenization and embedding, the stack of transformer blocks, and the language model head. - The inner workings of the transformer block, including attention, which calculates relevance scores, and the feedforward layer, which incorporates stored information learned in training. - How cached calculations make transformers faster. - Some of the most recent ideas in the latest models such as Mixture-of-Experts (MoE) which uses multiple sub-models and a router on each layer to improve the quality of LLMs. By the end of this course, you’ll have a deep understanding of how LLMs actually process text and be able to read through papers describing the latest models and understand the details. Gaining this intuition will improve your approach to building LLM applications. Please sign up here:

Andrew Ng

250,001 views • 1 year ago

Today, we're joined by Julie Kallini ✨, PhD student at Stanford NLP Group to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into byte-level modeling as an alternative. We discuss the architecture of MrT5, its ability to learn language-specific compression rates, its performance on multilingual benchmarks and character-level manipulation tasks, and its performance and efficiency. For the “Mission: Impossible Language Models” paper, we review the core idea behind the research, the definition and creation of impossible languages, the creation of impossible language training datasets, and explore the bias of language model architectures towards natural language. 🎧 / 🎥 Listen or watch the full episode on our page: 📖 CHAPTERS =============================== 00:00 - Introduction 4:28 - Issues of tokenization for LLMs 11:26 - Sub-word tokenization versus byte level tokenization 16:28 - Inefficiencies of byte T5 17:08 - Mr. T5 architecture 22:05 - Language-specific compression rate 24:10 - Benchmarks 27:15 - Inference efficiency 28:50 - Applying MrT5 to other decoder models 31:15 - Future directions of MrT5 33:51 - Mission: Impossible Language Models paper 39:59 - Languages tested 45:13 - Language architectures biased toward natural languages vs impossible languages 48:19 - Future directions for Mission Impossible

The TWIML AI Podcast

11,758 views • 1 year ago

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

AK

249,259 views • 2 years ago

A 4-year-old child has seen 50x more information than the biggest LLMs. Yann LeCun is the Chief AI Scientist at Meta. He recently spoke on “The Expanding Universe of Generative Models” panel at the World Economic Forum in Davos. Yann highlighted the idea that a 4-year-old child is way smarter than current cutting-edge large language models (LLMs). “Think about what a child sees through vision. Put a number on how much information a 4-year-old child has seen during their life. It’s 20 Mbps going through the optical nerve for 16,000 wake hours in the first 4 years of life. 3,600 seconds per hour is 10^15 bytes. This is 50x more information than the biggest LLMs we have. A 4-year-old child is way smarter than these models having acquired an enormous amount of knowledge about how the world works.” The real constraint right now is the ability of LLMs to think. Today, LLMs are only capable of System 1 thinking. System 1 vs System 2 thinking was popularised in the book 'Thinking, Fast and Slow' by Daniel Kahneman. System 1 tasks involve quick, instinctive, automatic responses. LLMs struggle with discontinuous tasks that require a creative leap in progress as they imitate human responses. It's hard to go above human response accuracy if LLMs are only trained on humans. Models are building the track in front of them with each word being generated. What could it mean to give language models System 2 thinking? This remains a future development I'm excited about.

Alex Banks

22,958 views • 2 years ago