Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Ever wondered how training dynamics differ between LLMs 🖋️ and Vision 👁️ models? We explore this and close the gap between VMs and LLMs in our #NeurIPS2024 paper "TrAct: Making First-layer Pre-Activations Trainable". Paper📜 Video🎥

Felix Petersen

2,299 subscribers

20,875 views • 1 year ago •via X (Twitter)

Science & Technology Education #NeurIPS2024

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

We spoke with Laura Ruis from Cohere For AI and UCL about her paper "Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models" where she demonstrated an interesting gap between retrieval and reasoning queries in LLMs indicating the presence of synthesised procedural knowledge generation.

We spoke with Laura Ruis from Cohere For AI and UCL about her paper "Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models" where she demonstrated an interesting gap between retrieval and reasoning queries in LLMs indicating the presence of synthesised procedural knowledge generation.

Machine Learning Street Talk

10,997 views • 1 year ago

This is how large language models turn objects to vector representations. In this video, we explore how large language models (LLMs) convert objects into internal representations, especially when translating between languages like English and Hindi. Using real-world examples, we highlight the challenges of gender inference, grammatical structure, and why direct word-to-word translations often fail. If you're curious about how LLMs deal with multilingual contexts and what it takes to improve translation quality across languages, this video is for you. #LLMs #Vectors #LCM

This is how large language models turn objects to vector representations. In this video, we explore how large language models (LLMs) convert objects into internal representations, especially when translating between languages like English and Hindi. Using real-world examples, we highlight the challenges of gender inference, grammatical structure, and why direct word-to-word translations often fail. If you're curious about how LLMs deal with multilingual contexts and what it takes to improve translation quality across languages, this video is for you. #LLMs #Vectors #LCM

Gaurav Sen

27,368 views • 1 year ago

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

Heni Ben Amor

51,172 views • 6 months ago

🌍Are LLMs aware of cultural and legal safety in today’s geo-diverse world? 🚀Introducing SafeWorld, our #NeurIPS2024 paper and benchmark assessing LLMs’ understanding of geo-diverse safety, based on cultural norms and policies across 50 countries and 493 regions/races. ⚖️We also propose a multi-dimensional framework for evaluating contextual appropriateness, accuracy, and comprehensiveness, revealing major gaps in current LLMs. 🧨To address this, we train SafeWorldLM using DPO, achieving SOTA performance and a 20% higher global human evaluator rating in helpfulness and harmfulness over competing models, including GPT-4o. 🔗Paper: 💻 GitHub: 🫶🏻This is a joint leading effort with Da Yin. Also many thanks to the amazing team Kung-Hsiang Steeve Huang Kai-Wei Chang, and Violet Peng for their hard work. Check out more details and results we conclude from our paper in the thread below. 🧵

🌍Are LLMs aware of cultural and legal safety in today’s geo-diverse world? 🚀Introducing SafeWorld, our #NeurIPS2024 paper and benchmark assessing LLMs’ understanding of geo-diverse safety, based on cultural norms and policies across 50 countries and 493 regions/races. ⚖️We also propose a multi-dimensional framework for evaluating contextual appropriateness, accuracy, and comprehensiveness, revealing major gaps in current LLMs. 🧨To address this, we train SafeWorldLM using DPO, achieving SOTA performance and a 20% higher global human evaluator rating in helpfulness and harmfulness over competing models, including GPT-4o. 🔗Paper: 💻 GitHub: 🫶🏻This is a joint leading effort with Da Yin. Also many thanks to the amazing team Kung-Hsiang Steeve Huang Kai-Wei Chang, and Violet Peng for their hard work. Check out more details and results we conclude from our paper in the thread below. 🧵

Haoyi Qiu

16,344 views • 1 year ago

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (ollama) with frontier LLMs in the cloud (OpenAI, Together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost while maintaining 97.9% of the accuracy. See Gru and the Minions in action below, 🔉on please (h/t )!

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (ollama) with frontier LLMs in the cloud (OpenAI, Together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost while maintaining 97.9% of the accuracy. See Gru and the Minions in action below, 🔉on please (h/t )!

Dan Biderman

192,910 views • 1 year ago

Photoshop for text. In our #CHI2025 paper “Textoshop”, we explore how interactions inspired by drawing software can help edit text. We consider words as pixels, sentences as regions, and tones as colours. #HCI #NLProc #LLMs #AI Thread 🧵 (1/10)

Photoshop for text. In our #CHI2025 paper “Textoshop”, we explore how interactions inspired by drawing software can help edit text. We consider words as pixels, sentences as regions, and tones as colours. #HCI #NLProc #LLMs #AI Thread 🧵 (1/10)

Damien Masson

88,119 views • 1 year ago

How do LLMs actually work? 🧠 Watch our video below and understand LLMs in 1 minute 👇

How do LLMs actually work? 🧠 Watch our video below and understand LLMs in 1 minute 👇

ChainGPT

47,719 views • 7 months ago

The gap between training and victory? She knows how to close it. 😤 📹: @femke_bol #Olympics | #Athletics

The gap between training and victory? She knows how to close it. 😤 📹: @femke_bol #Olympics | #Athletics

The Olympic Games

25,260 views • 10 months ago

Rick Rule talks about the 100 to 1 imbalance in the silver market between the paper and physical markets. And how close we came to seeing that break during the #SilverSqueeze back in 2021.

Rick Rule talks about the 100 to 1 imbalance in the silver market between the paper and physical markets. And how close we came to seeing that break during the #SilverSqueeze back in 2021.

Chris Marcus

42,302 views • 11 months ago

Ever wondered how market-making works? 🤔 In today's video, we explore who market makers are, what they do, and how everything works. Watch the full video:

Ever wondered how market-making works? 🤔 In today's video, we explore who market makers are, what they do, and how everything works. Watch the full video:

CoinGecko

20,216 views • 1 year ago

🧵1/n LLMs significantly improve Evolutionary Algorithms for molecular discovery! For 18 different molecular optimization tasks, we demonstrate how to achieve SOTA performance by incorporating different LLMs! Learn more in our new paper! Website: Code)

🧵1/n LLMs significantly improve Evolutionary Algorithms for molecular discovery! For 18 different molecular optimization tasks, we demonstrate how to achieve SOTA performance by incorporating different LLMs! Learn more in our new paper! Website: Code)

Yuanqi Du

17,892 views • 1 year ago

Speech-native models like Moshi sound great and answer fast, but aren’t as smart as text LLMs. In our new paper, MoshiRAG, we show how Moshi can ask for advice from a text LLM or a knowledge base. The tricky part is how to do this in real time without adding latency. 🧵

Speech-native models like Moshi sound great and answer fast, but aren’t as smart as text LLMs. In our new paper, MoshiRAG, we show how Moshi can ask for advice from a text LLM or a knowledge base. The tricky part is how to do this in real time without adding latency. 🧵

kyutai

52,215 views • 1 month ago

Introducing GRID: the General Robot Intelligence Development platform, designed for prototyping smart and safe robots rapidly using foundation models, LLMs, and simulation. Paper: Try now: GitHub: 🧵👇(1/N)

Introducing GRID: the General Robot Intelligence Development platform, designed for prototyping smart and safe robots rapidly using foundation models, LLMs, and simulation. Paper: Try now: GitHub: 🧵👇(1/N)

Sai Vemprala

277,281 views • 2 years ago

Robot AI brains, aka Vision-Language-Action models, cannot adapt to new tasks as easily as LLMs like Gemini, ChatGPT, or Grok. LLMs can adapt quickly with their in-context learning (ICL) capabilities. But can we inject ICL abilities into a pre-trained VLA like pi0? Yes! Introducing RICL (Retraining for In-Context Learning), our Conference on Robot Learning (CoRL) 2025 paper. Our RICL-pi0 model can adapt to unseen objects, novel motions, and new scenes with just ICL and RAG (retrieval-augmented generation). RICL-pi0 also boosts performance on the long-tail of tasks. A quick 1 minute video summary:

Robot AI brains, aka Vision-Language-Action models, cannot adapt to new tasks as easily as LLMs like Gemini, ChatGPT, or Grok. LLMs can adapt quickly with their in-context learning (ICL) capabilities. But can we inject ICL abilities into a pre-trained VLA like pi0? Yes! Introducing RICL (Retraining for In-Context Learning), our Conference on Robot Learning (CoRL) 2025 paper. Our RICL-pi0 model can adapt to unseen objects, novel motions, and new scenes with just ICL and RAG (retrieval-augmented generation). RICL-pi0 also boosts performance on the long-tail of tasks. A quick 1 minute video summary:

Kaustubh Sridhar

52,158 views • 9 months ago

navigating the latent space of colors 🌈✨ do LLMs “see” colors differently from humans? we perceive colors through wavelengths while LLMs rely on semantic relationships between words. to explore this, i mapped two color spaces:

navigating the latent space of colors 🌈✨ do LLMs “see” colors differently from humans? we perceive colors through wavelengths while LLMs rely on semantic relationships between words. to explore this, i mapped two color spaces:

Kat ⊷ the Poet Engineer

146,089 views • 1 year ago

It's going to be insane when we can prompt our own virtual worlds like this and explore them. Imagine adding other characters you can talk to with LLMs + voice models!

It's going to be insane when we can prompt our own virtual worlds like this and explore them. Imagine adding other characters you can talk to with LLMs + voice models!

Justine Moore

2,187,509 views • 11 months ago

🚀We are excited to introduce the Tool Decathlon (Toolathlon), a benchmark for language agents on diverse, complex, and realistic tool use. ⭐️32 applications and 600+ tools based on real-world software environments ⭐️Execution-based, reliable evaluation ⭐️Realistic, covering daily and professional scenarios Toolathlon reveals significant shortcomings of SOTA LLMs in realistic tool-use tasks, where Claude Sonnet 4.5 achieves 38.6% success rate. It also indicates a clear gap between open-source and leading proprietary models. Check our blog: Github: Paper: 🧵⬇️

🚀We are excited to introduce the Tool Decathlon (Toolathlon), a benchmark for language agents on diverse, complex, and realistic tool use. ⭐️32 applications and 600+ tools based on real-world software environments ⭐️Execution-based, reliable evaluation ⭐️Realistic, covering daily and professional scenarios Toolathlon reveals significant shortcomings of SOTA LLMs in realistic tool-use tasks, where Claude Sonnet 4.5 achieves 38.6% success rate. It also indicates a clear gap between open-source and leading proprietary models. Check our blog: Github: Paper: 🧵⬇️

Junxian He

43,599 views • 7 months ago

Ever wondered what a footballer does all day on a pre-season training camp? We documented the first 24 hours in Budapest with our Under-21s and Under-18s. 💪

Ever wondered what a footballer does all day on a pre-season training camp? We documented the first 24 hours in Budapest with our Under-21s and Under-18s. 💪

Birmingham City Academy

24,628 views • 11 months ago

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

AK

249,494 views • 2 years ago

$The next frontier for AI shouldn’t just be generally helpful. It should be helpful for you! Our new paper shows how to personalize LLMs — efficiently, scalably, and without retraining. Meet PReF ( 1\n$

The next frontier for AI shouldn’t just be generally helpful. It should be helpful for you! Our new paper shows how to personalize LLMs — efficiently, scalably, and without retraining. Meet PReF ( 1\n

idan shenfeld

10,252 views • 1 year ago