Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Ever wondered how training dynamics differ between LLMs 🖋️ and Vision 👁️ models? We explore this and close the gap between VMs and LLMs in our #NeurIPS2024 paper "TrAct: Making First-layer Pre-Activations Trainable". Paper📜 Video🎥

Felix Petersen

2,298 subscribers

20,931 views • 1 year ago •via X (Twitter)

Science & Technology Education #NeurIPS2024

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

We spoke with Laura Ruis from Cohere For AI and UCL about her paper "Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models" where she demonstrated an interesting gap between retrieval and reasoning queries in LLMs indicating the presence of synthesised procedural knowledge generation.

We spoke with Laura Ruis from Cohere For AI and UCL about her paper "Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models" where she demonstrated an interesting gap between retrieval and reasoning queries in LLMs indicating the presence of synthesised procedural knowledge generation.

Machine Learning Street Talk

10,997 views • 1 year ago

This is how large language models turn objects to vector representations. In this video, we explore how large language models (LLMs) convert objects into internal representations, especially when translating between languages like English and Hindi. Using real-world examples, we highlight the challenges of gender inference, grammatical structure, and why direct word-to-word translations often fail. If you're curious about how LLMs deal with multilingual contexts and what it takes to improve translation quality across languages, this video is for you. #LLMs #Vectors #LCM

This is how large language models turn objects to vector representations. In this video, we explore how large language models (LLMs) convert objects into internal representations, especially when translating between languages like English and Hindi. Using real-world examples, we highlight the challenges of gender inference, grammatical structure, and why direct word-to-word translations often fail. If you're curious about how LLMs deal with multilingual contexts and what it takes to improve translation quality across languages, this video is for you. #LLMs #Vectors #LCM

Gaurav Sen

27,368 views • 1 year ago

🌍Are LLMs aware of cultural and legal safety in today’s geo-diverse world? 🚀Introducing SafeWorld, our #NeurIPS2024 paper and benchmark assessing LLMs’ understanding of geo-diverse safety, based on cultural norms and policies across 50 countries and 493 regions/races. ⚖️We also propose a multi-dimensional framework for evaluating contextual appropriateness, accuracy, and comprehensiveness, revealing major gaps in current LLMs. 🧨To address this, we train SafeWorldLM using DPO, achieving SOTA performance and a 20% higher global human evaluator rating in helpfulness and harmfulness over competing models, including GPT-4o. 🔗Paper: 💻 GitHub: 🫶🏻This is a joint leading effort with Da Yin. Also many thanks to the amazing team Kung-Hsiang Steeve Huang Kai-Wei Chang, and Violet Peng for their hard work. Check out more details and results we conclude from our paper in the thread below. 🧵

🌍Are LLMs aware of cultural and legal safety in today’s geo-diverse world? 🚀Introducing SafeWorld, our #NeurIPS2024 paper and benchmark assessing LLMs’ understanding of geo-diverse safety, based on cultural norms and policies across 50 countries and 493 regions/races. ⚖️We also propose a multi-dimensional framework for evaluating contextual appropriateness, accuracy, and comprehensiveness, revealing major gaps in current LLMs. 🧨To address this, we train SafeWorldLM using DPO, achieving SOTA performance and a 20% higher global human evaluator rating in helpfulness and harmfulness over competing models, including GPT-4o. 🔗Paper: 💻 GitHub: 🫶🏻This is a joint leading effort with Da Yin. Also many thanks to the amazing team Kung-Hsiang Steeve Huang Kai-Wei Chang, and Violet Peng for their hard work. Check out more details and results we conclude from our paper in the thread below. 🧵

Haoyi Qiu

16,344 views • 1 year ago

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (ollama) with frontier LLMs in the cloud (OpenAI, Together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost while maintaining 97.9% of the accuracy. See Gru and the Minions in action below, 🔉on please (h/t )!

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (ollama) with frontier LLMs in the cloud (OpenAI, Together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost while maintaining 97.9% of the accuracy. See Gru and the Minions in action below, 🔉on please (h/t )!

Dan Biderman

193,319 views • 1 year ago

Photoshop for text. In our #CHI2025 paper “Textoshop”, we explore how interactions inspired by drawing software can help edit text. We consider words as pixels, sentences as regions, and tones as colours. #HCI #NLProc #LLMs #AI Thread 🧵 (1/10)

Photoshop for text. In our #CHI2025 paper “Textoshop”, we explore how interactions inspired by drawing software can help edit text. We consider words as pixels, sentences as regions, and tones as colours. #HCI #NLProc #LLMs #AI Thread 🧵 (1/10)

Damien Masson

88,261 views • 1 year ago

Why do we get a memory wipe in between reincarnations? And how does the human incarnation cycle mimic Ai and LLMs…? The answer might surprise you

Why do we get a memory wipe in between reincarnations? And how does the human incarnation cycle mimic Ai and LLMs…? The answer might surprise you

Jordan Crowder

10,730 views • 1 month ago

Rick Rule talks about the 100 to 1 imbalance in the silver market between the paper and physical markets. And how close we came to seeing that break during the #SilverSqueeze back in 2021.

Rick Rule talks about the 100 to 1 imbalance in the silver market between the paper and physical markets. And how close we came to seeing that break during the #SilverSqueeze back in 2021.

Chris Marcus

42,302 views • 1 year ago

Ever wondered how market-making works? 🤔 In today's video, we explore who market makers are, what they do, and how everything works. Watch the full video:

Ever wondered how market-making works? 🤔 In today's video, we explore who market makers are, what they do, and how everything works. Watch the full video:

CoinGecko

20,244 views • 1 year ago

Speech-native models like Moshi sound great and answer fast, but aren’t as smart as text LLMs. In our new paper, MoshiRAG, we show how Moshi can ask for advice from a text LLM or a knowledge base. The tricky part is how to do this in real time without adding latency. 🧵

Speech-native models like Moshi sound great and answer fast, but aren’t as smart as text LLMs. In our new paper, MoshiRAG, we show how Moshi can ask for advice from a text LLM or a knowledge base. The tricky part is how to do this in real time without adding latency. 🧵

kyutai

52,598 views • 2 months ago

Robot AI brains, aka Vision-Language-Action models, cannot adapt to new tasks as easily as LLMs like Gemini, ChatGPT, or Grok. LLMs can adapt quickly with their in-context learning (ICL) capabilities. But can we inject ICL abilities into a pre-trained VLA like pi0? Yes! Introducing RICL (Retraining for In-Context Learning), our Conference on Robot Learning (CoRL) 2025 paper. Our RICL-pi0 model can adapt to unseen objects, novel motions, and new scenes with just ICL and RAG (retrieval-augmented generation). RICL-pi0 also boosts performance on the long-tail of tasks. A quick 1 minute video summary:

Robot AI brains, aka Vision-Language-Action models, cannot adapt to new tasks as easily as LLMs like Gemini, ChatGPT, or Grok. LLMs can adapt quickly with their in-context learning (ICL) capabilities. But can we inject ICL abilities into a pre-trained VLA like pi0? Yes! Introducing RICL (Retraining for In-Context Learning), our Conference on Robot Learning (CoRL) 2025 paper. Our RICL-pi0 model can adapt to unseen objects, novel motions, and new scenes with just ICL and RAG (retrieval-augmented generation). RICL-pi0 also boosts performance on the long-tail of tasks. A quick 1 minute video summary:

Kaustubh Sridhar

52,158 views • 11 months ago

🚀We are excited to introduce the Tool Decathlon (Toolathlon), a benchmark for language agents on diverse, complex, and realistic tool use. ⭐️32 applications and 600+ tools based on real-world software environments ⭐️Execution-based, reliable evaluation ⭐️Realistic, covering daily and professional scenarios Toolathlon reveals significant shortcomings of SOTA LLMs in realistic tool-use tasks, where Claude Sonnet 4.5 achieves 38.6% success rate. It also indicates a clear gap between open-source and leading proprietary models. Check our blog: Github: Paper: 🧵⬇️

🚀We are excited to introduce the Tool Decathlon (Toolathlon), a benchmark for language agents on diverse, complex, and realistic tool use. ⭐️32 applications and 600+ tools based on real-world software environments ⭐️Execution-based, reliable evaluation ⭐️Realistic, covering daily and professional scenarios Toolathlon reveals significant shortcomings of SOTA LLMs in realistic tool-use tasks, where Claude Sonnet 4.5 achieves 38.6% success rate. It also indicates a clear gap between open-source and leading proprietary models. Check our blog: Github: Paper: 🧵⬇️

Junxian He

43,743 views • 8 months ago

Ever wondered what a footballer does all day on a pre-season training camp? We documented the first 24 hours in Budapest with our Under-21s and Under-18s. 💪

Ever wondered what a footballer does all day on a pre-season training camp? We documented the first 24 hours in Budapest with our Under-21s and Under-18s. 💪

Birmingham City Academy

24,628 views • 1 year ago

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

AK

249,708 views • 3 years ago

New paper: You can make ChatGPT 2x as creative with one sentence. Ever notice how LLMs all sound the same? They know 100+ jokes but only ever tell one. Every blog intro: "In today's digital landscape..." We figured out why – and how to unlock the rest 🔓 Copy-paste prompt: 🧵

New paper: You can make ChatGPT 2x as creative with one sentence. Ever notice how LLMs all sound the same? They know 100+ jokes but only ever tell one. Every blog intro: "In today's digital landscape..." We figured out why – and how to unlock the rest 🔓 Copy-paste prompt: 🧵

Weiyan Shi

329,823 views • 9 months ago

Dario Amodei says pre-training sits somewhere between learning and evolution. Humans inherit priors shaped over millions of years. LLMs start as random weights and distill trillions of tokens into those priors. We describe them using human learning metaphors. But the analogy only goes so far.

Dario Amodei says pre-training sits somewhere between learning and evolution. Humans inherit priors shaped over millions of years. LLMs start as random weights and distill trillions of tokens into those priors. We describe them using human learning metaphors. But the analogy only goes so far.

vitrupo

45,540 views • 5 months ago

Video might be the next intelligence substrate. Strikingly, video models are beginning to exhibit the same emergent reasoning behaviors first observed in LLMs—multi-path search, self-correction, and layer specialization. We demystify video reasoning and show it doesn’t happen frame-by-frame, but along diffusion steps. 🔗 📄 So, what's next? ;)

Video might be the next intelligence substrate. Strikingly, video models are beginning to exhibit the same emergent reasoning behaviors first observed in LLMs—multi-path search, self-correction, and layer specialization. We demystify video reasoning and show it doesn’t happen frame-by-frame, but along diffusion steps. 🔗 📄 So, what's next? ;)

Zhongang Cai

70,749 views • 4 months ago

Introducing Digital Red Queen (DRQ): Adversarial Program Evolution in Core War with LLMs Blog: Core War is a programming game where self-replicating assembly programs, called warriors, compete for control of a virtual machine. In this dynamic environment, where there is no distinction between code and data, warriors must crash opponents while defending themselves to survive. In this work, we explore how LLMs can drive open-ended adversarial evolution of these programs within Core War. Our approach is inspired by the Red Queen Hypothesis from evolutionary biology: the principle that species must continually adapt and evolve simply to survive against ever-changing competitors. We found that running our DRQ algorithm for longer durations produces warriors that become more generally robust. Most notably, we observed an emergent pressure towards convergent evolution. Independent runs, starting from completely different initial conditions, evolved toward similar general-purpose behaviors—mirroring how distinct species in nature often evolve similar traits to solve the same problems. Simulating these adversarial dynamics in an isolated sandbox offers a glimpse into the future, where deployed LLM systems might eventually compete against one another for computational or physical resources in the real world. This project is a collaboration between MIT and Sakana AI led by Akarsh Kumar Full Paper (Website): Full Paper (arxiv): Code:

Introducing Digital Red Queen (DRQ): Adversarial Program Evolution in Core War with LLMs Blog: Core War is a programming game where self-replicating assembly programs, called warriors, compete for control of a virtual machine. In this dynamic environment, where there is no distinction between code and data, warriors must crash opponents while defending themselves to survive. In this work, we explore how LLMs can drive open-ended adversarial evolution of these programs within Core War. Our approach is inspired by the Red Queen Hypothesis from evolutionary biology: the principle that species must continually adapt and evolve simply to survive against ever-changing competitors. We found that running our DRQ algorithm for longer durations produces warriors that become more generally robust. Most notably, we observed an emergent pressure towards convergent evolution. Independent runs, starting from completely different initial conditions, evolved toward similar general-purpose behaviors—mirroring how distinct species in nature often evolve similar traits to solve the same problems. Simulating these adversarial dynamics in an isolated sandbox offers a glimpse into the future, where deployed LLM systems might eventually compete against one another for computational or physical resources in the real world. This project is a collaboration between MIT and Sakana AI led by Akarsh Kumar Full Paper (Website): Full Paper (arxiv): Code:

Sakana AI

143,831 views • 6 months ago

Announcing How Transformer LLMs Work, created with Jay Alammar and Maarten Grootendorst, co-authors of the beautifully illustrated book, “Hands-On Large Language Models.” This course offers a deep dive into the inner workings of the transformer architecture that powers large language models (LLMs). The transformer architecture revolutionized generative AI; in fact, the "GPT" in ChatGPT stands for "Generative Pre-Trained Transformer." Originally introduced in the Google Brain team's groundbreaking 2017 paper "Attention Is All You Need," by Vaswani and others, transformers were a highly scalable model for machine translation tasks. Variants of this architecture now power today’s LLMs such as those from OpenAI, Google, Meta, Cohere, Anthropic and DeepSeek. In this course, you’ll learn in detail how LLMs process text. You'll also work through code examples that illustrate that transformer's individual components. In details, you’ll learn: - How the representation of language has evolved, from Bag-of-Words to Word2Vec embeddings to the transformer architecture that captures a word's meanings taking into account the context of other words in the input. - How inputs are broken down into tokens before they are sent to the language model. - The details of a transformer's main stages: Tokenization and embedding, the stack of transformer blocks, and the language model head. - The inner workings of the transformer block, including attention, which calculates relevance scores, and the feedforward layer, which incorporates stored information learned in training. - How cached calculations make transformers faster. - Some of the most recent ideas in the latest models such as Mixture-of-Experts (MoE) which uses multiple sub-models and a router on each layer to improve the quality of LLMs. By the end of this course, you’ll have a deep understanding of how LLMs actually process text and be able to read through papers describing the latest models and understand the details. Gaining this intuition will improve your approach to building LLM applications. Please sign up here:

Announcing How Transformer LLMs Work, created with Jay Alammar and Maarten Grootendorst, co-authors of the beautifully illustrated book, “Hands-On Large Language Models.” This course offers a deep dive into the inner workings of the transformer architecture that powers large language models (LLMs). The transformer architecture revolutionized generative AI; in fact, the "GPT" in ChatGPT stands for "Generative Pre-Trained Transformer." Originally introduced in the Google Brain team's groundbreaking 2017 paper "Attention Is All You Need," by Vaswani and others, transformers were a highly scalable model for machine translation tasks. Variants of this architecture now power today’s LLMs such as those from OpenAI, Google, Meta, Cohere, Anthropic and DeepSeek. In this course, you’ll learn in detail how LLMs process text. You'll also work through code examples that illustrate that transformer's individual components. In details, you’ll learn: - How the representation of language has evolved, from Bag-of-Words to Word2Vec embeddings to the transformer architecture that captures a word's meanings taking into account the context of other words in the input. - How inputs are broken down into tokens before they are sent to the language model. - The details of a transformer's main stages: Tokenization and embedding, the stack of transformer blocks, and the language model head. - The inner workings of the transformer block, including attention, which calculates relevance scores, and the feedforward layer, which incorporates stored information learned in training. - How cached calculations make transformers faster. - Some of the most recent ideas in the latest models such as Mixture-of-Experts (MoE) which uses multiple sub-models and a router on each layer to improve the quality of LLMs. By the end of this course, you’ll have a deep understanding of how LLMs actually process text and be able to read through papers describing the latest models and understand the details. Gaining this intuition will improve your approach to building LLM applications. Please sign up here:

Andrew Ng

259,421 views • 1 year ago

Our paper “Learning Situated Awareness in the Real World” has been accepted to #ICML2026 as a Spotlight (top 2.2%)! Congrats to Chuhan Li ✈️ICML and the team! We introduce SAW-Bench, a real-world egocentric benchmark for evaluating situated (first-person) spatial reasoning in multimodal models, uncovering a large gap between humans and current systems. All videos recorded via Meta Ray-Ban 2 glasses!

Our paper “Learning Situated Awareness in the Real World” has been accepted to #ICML2026 as a Spotlight (top 2.2%)! Congrats to Chuhan Li ✈️ICML and the team! We introduce SAW-Bench, a real-world egocentric benchmark for evaluating situated (first-person) spatial reasoning in multimodal models, uncovering a large gap between humans and current systems. All videos recorded via Meta Ray-Ban 2 glasses!

Xin Eric Wang

11,529 views • 2 months ago

💳 EstateX Pay will close the gap between liquidity and real estate. Earn with EstateX. Spend with EstateX. 🎥 Watch the video from KBW 👇 Join our community 👇

💳 EstateX Pay will close the gap between liquidity and real estate. Earn with EstateX. Spend with EstateX. 🎥 Watch the video from KBW 👇 Join our community 👇

EstateX

121,735 views • 8 months ago