Nathan Lambert's banner

Nathan Lambert

@natolambert • 94,944 subscribers

Open model research @ something new. Prev. co-led Olmo at Ai2. Contact via email. Writes @interconnectsai Wrote The RLHF Book, 🏔️🏃‍♂️

Shorts

A new contender enters the arena... RLHF Book coming soon to print.

A new contender enters the arena... RLHF Book coming soon to print.

70,749 views

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

New podcast with Florian Brand on all things open models. More on Kimi K3, Qwen 3.8, GLM-5.2, Xi's WAIC speech, distillation, the open-closed Gap, and what's next. Chapters: 00:00 Welcome & context 04:38 Living with / using Kimi K3 08:53 GLM 5.2’s continued role 12:47 How are the Chinese models this good? 17:41 Data, environments, and a tour of the Chinese labs 19:47 Roundup of Chinese providers: Qwen, DeepSeek, MiniMax… 24:08 The US open-model ecosystem 30:25 Frontier vs. near-frontier, and the cybersecurity case against bans 34:58 Distillation and the Ben Thompson debate 44:12 Predictions and a frontier tier list 48:36 Wrap-up Hoping to keep doing a few more of these on Interconnects. Crucial times in AI, we're working hard to share our expertise.

16,517 views • 1 day ago

New lecture! This one is a recap of a bunch of history of preferences, the nature of rewards, how RLHF is formulated, which were once seen as central problems in the field. How much as changed. Still... super interesting to understand our optimization tools today. Books coming soon :D 00:00 Intro & context 07:34 A short history of preferences (from Aristotle to the VNM Utility Theorem) 20:17 A brief overview of preference data (from the last two years of my practice) 31:11 Open questions in RLHF data Lecture 8, covering Chapters 10 & 11 of my book.

New lecture! This one is a recap of a bunch of history of preferences, the nature of rewards, how RLHF is formulated, which were once seen as central problems in the field. How much as changed. Still... super interesting to understand our optimization tools today. Books coming soon :D 00:00 Intro & context 07:34 A short history of preferences (from Aristotle to the VNM Utility Theorem) 20:17 A brief overview of preference data (from the last two years of my practice) 31:11 Open questions in RLHF data Lecture 8, covering Chapters 10 & 11 of my book.

27,707 views • 3 days ago

New lecture for the book! Nominally about synthetic data, but mostly is a walk through of the distillation literature from the Hinton 2015 paper to multi-teach on-policy distillation of today! At 7.4 hours of video in my post-training brain dump and counting :) It was fun to stare at the math long enough and talk through the 3-4 core changes that needed to be made to the original formulation to have on-policy distillation be ready for the mainstream like it is today (and in RL frameworks). Otherwise, I include a bit of a history lesson for how synthetic data generally slowly took over all post-training data research (it wasn't always the case)! Then I do some 101 review on constitutional AI, rubrics, and other popular methods. 00:00 The emergence of synthetic data 10:50 Background on teacher-student knowledge-distillation 24:47: On-policy distillation (OPD, MOPD, and OPSD) 37:11 Constitutional AI & AI Feedback 45:50 Rubrics as rewards & conclusions Ofc, watch on YouTube etc.

New lecture for the book! Nominally about synthetic data, but mostly is a walk through of the distillation literature from the Hinton 2015 paper to multi-teach on-policy distillation of today! At 7.4 hours of video in my post-training brain dump and counting :) It was fun to stare at the math long enough and talk through the 3-4 core changes that needed to be made to the original formulation to have on-policy distillation be ready for the mainstream like it is today (and in RL frameworks). Otherwise, I include a bit of a history lesson for how synthetic data generally slowly took over all post-training data research (it wasn't always the case)! Then I do some 101 review on constitutional AI, rubrics, and other popular methods. 00:00 The emergence of synthetic data 10:50 Background on teacher-student knowledge-distillation 24:47: On-policy distillation (OPD, MOPD, and OPSD) 37:11 Constitutional AI & AI Feedback 45:50 Rubrics as rewards & conclusions Ofc, watch on YouTube etc.

91,890 views • 1 month ago

Another quick lecture -- I've been asked many times for prereq's to my book and what you should know, so built a little lecture (with GLM 5.2) to cover some more basics. Topics include: 00:00 Introduction & Course Prerequisites 01:37 Language Models Overview 02:47 The LM Head 04:29 Softmax & Log-Probabilities 06:13 Anatomy of an LM Training Example 06:37 Computing LLM Probabilities (+Phoebe the Dog) 09:52 Three Common Masks in Post-Training 11:03 A Small Decoding Review 12:14 Training an LM: Cross-Entropy 13:23 Optimization & Fine-Tuning 13:55 Pretraining to Midtraining to SFT Pipeline 15:25 Probability Essentials: KL Divergence & Entropy 19:36 Sigmoid & Pairwise Likelihood 20:29 Reinforcement Learning Framing (MDP) 22:28 Transitioning Tools into Post-Training 23:12 Recommended Resources & Wrap-Up  Happy learning and I'm still taking questions from during the course for Q&A videos.

Another quick lecture -- I've been asked many times for prereq's to my book and what you should know, so built a little lecture (with GLM 5.2) to cover some more basics. Topics include: 00:00 Introduction & Course Prerequisites 01:37 Language Models Overview 02:47 The LM Head 04:29 Softmax & Log-Probabilities 06:13 Anatomy of an LM Training Example 06:37 Computing LLM Probabilities (+Phoebe the Dog) 09:52 Three Common Masks in Post-Training 11:03 A Small Decoding Review 12:14 Training an LM: Cross-Entropy 13:23 Optimization & Fine-Tuning 13:55 Pretraining to Midtraining to SFT Pipeline 15:25 Probability Essentials: KL Divergence & Entropy 19:36 Sigmoid & Pairwise Likelihood 20:29 Reinforcement Learning Framing (MDP) 22:28 Transitioning Tools into Post-Training 23:12 Recommended Resources & Wrap-Up  Happy learning and I'm still taking questions from during the course for Q&A videos.

50,424 views • 29 days ago

New podcast with finbarr! We survey the latest post-training recipes, from GLM 5.1, Kimi K2.6, DeepSeek V4, Xiaomi MiMo V2.5, Nemotron Ultra, etc. and discuss: - Why the industry slowly shifted to multi-teacher on-policy distillation (MOPD). - What an Olmo-style recipe would need improvements in - How post-training works / suits larger organizational efforts - Career advice in the foothills of the singularity - and other topics I heard y'all wanted me to start doing this, so making some time when I'm in funemployment! Chapters: 00:00 Introduction & Olmo reflections 06:28 Post-train recipes review (history) 23:00 2026’s model recipes (MiMo Flash, DeepSeek V4, GLM 5, Kimi K2.6, etc.) 39:05 Open-ended post-training discussions 48:22 Career advice in the LLM race Links below, please follow Interconnects and like and subscribe and buy my book?

New podcast with finbarr! We survey the latest post-training recipes, from GLM 5.1, Kimi K2.6, DeepSeek V4, Xiaomi MiMo V2.5, Nemotron Ultra, etc. and discuss: - Why the industry slowly shifted to multi-teacher on-policy distillation (MOPD). - What an Olmo-style recipe would need improvements in - How post-training works / suits larger organizational efforts - Career advice in the foothills of the singularity - and other topics I heard y'all wanted me to start doing this, so making some time when I'm in funemployment! Chapters: 00:00 Introduction & Olmo reflections 06:28 Post-train recipes review (history) 23:00 2026’s model recipes (MiMo Flash, DeepSeek V4, GLM 5, Kimi K2.6, etc.) 39:05 Open-ended post-training discussions 48:22 Career advice in the LLM race Links below, please follow Interconnects and like and subscribe and buy my book?

42,732 views • 1 month ago

Okay okay, spent my weekend gooning around learning GRPO math. Here's some takes. Essentially, this is me yapping through a recap of smaller details on how GRPO is implemented, what Dr. GRPO changes, why, DAPO, connections to PPO, aggregating batches... Reading list below.

Okay okay, spent my weekend gooning around learning GRPO math. Here's some takes. Essentially, this is me yapping through a recap of smaller details on how GRPO is implemented, what Dr. GRPO changes, why, DAPO, connections to PPO, aggregating batches... Reading list below.

123,113 views • 1 year ago

Here's a recent talk I gave recapping the last 6-12 months of AI progress, why getting perfect models is hard, how labs are likely approaching the next phase of training (for agents), and other interesting tidbits across the reasoning landscape. Topics: 00:00 Introduction & the state of reasoning 05:50 Hillclimbing imperfect evals 09:18 Technical bottlenecks 13:02 Sycophancy 18:08 The Goldilocks Zone 19:28 What comes next? (hint, planning) 26:40 Q&A YouTube etc in replies. Thanks Kyle Corbitt and OpenPipe for hosting me.

Here's a recent talk I gave recapping the last 6-12 months of AI progress, why getting perfect models is hard, how labs are likely approaching the next phase of training (for agents), and other interesting tidbits across the reasoning landscape. Topics: 00:00 Introduction & the state of reasoning 05:50 Hillclimbing imperfect evals 09:18 Technical bottlenecks 13:02 Sycophancy 18:08 The Goldilocks Zone 19:28 What comes next? (hint, planning) 26:40 Q&A YouTube etc in replies. Thanks Kyle Corbitt and OpenPipe for hosting me.

89,519 views • 1 year ago

Jensen on Nemotron 3 Ultra and it's role in the open ecosystem. It's great to have Nvidia leading on this, with so much turnover in open model providers in recent years. Super excited to see how the model turns out.

Jensen on Nemotron 3 Ultra and it's role in the open ecosystem. It's great to have Nvidia leading on this, with so much turnover in open model providers in recent years. Super excited to see how the model turns out.

26,650 views • 4 months ago

DPO Debate: Is RL needed for RLHF? All things as we cannot settle if DPO or RL is better. At least it is a good exercise. 1. Derivations in the DPO paper. Hint, the authors are good at math 2. cDPO, IPO, and related equations 3. Speculation on potential oddities of DPO vs RL 4. Reminders on the state of open RLHF tldr: we have more limitations with data and tooling and evaluation than optimizer choice Slides: Recent blog post of mine on DPO (more next Wed.): DPO Paper: On youtube:

DPO Debate: Is RL needed for RLHF? All things as we cannot settle if DPO or RL is better. At least it is a good exercise. 1. Derivations in the DPO paper. Hint, the authors are good at math 2. cDPO, IPO, and related equations 3. Speculation on potential oddities of DPO vs RL 4. Reminders on the state of open RLHF tldr: we have more limitations with data and tooling and evaluation than optimizer choice Slides: Recent blog post of mine on DPO (more next Wed.): DPO Paper: On youtube:

100,027 views • 2 years ago

Here's my conversation with Lucas Atkins and the team at Arcee.ai on their path to training and releasing Trinity Large today. From going all in on open models built end to end in the US 6 months ago to having the model in hand is no easy feet. I loved this conversation on how to design a startup around open models and take a bold step to scale it up. I'm openly an Arcee fan, watching them take risk and pull it off. We discuss: - The state (and future) of open vs. closed models, - The business of selling open models for on-prem deployments, - The story of Arcee AI & going “all-in” on this training run, - The ATOM project, - Building frontier model training teams in 6 months, - and other great topics. I really loved this one, and think you well too. Chapters: 00:00:00 Intro: Arcee AI, Trinity Models & Trinity Large 00:08:26 Transitioning a Company to Pre-training 00:13:00 Technical Decisions: Muon and MoE 00:18:41 Scaling and MoE Training Pain 00:23:14 Post-training and RL Strategies 00:28:09 Team Structure and Data Scaling 00:31:31 The Trinity Manifesto: US Open Weights 00:42:31 Specialized Models and Distillation 00:47:12 Infrastructure and Hosting 400B 00:50:53 Open Source as a Business Moat 00:56:31 Predictions: Best Model in 2026 01:02:29 Lightning Round & Conclusions More great open model builder podcasts coming soon!

Here's my conversation with Lucas Atkins and the team at Arcee.ai on their path to training and releasing Trinity Large today. From going all in on open models built end to end in the US 6 months ago to having the model in hand is no easy feet. I loved this conversation on how to design a startup around open models and take a bold step to scale it up. I'm openly an Arcee fan, watching them take risk and pull it off. We discuss: - The state (and future) of open vs. closed models, - The business of selling open models for on-prem deployments, - The story of Arcee AI & going “all-in” on this training run, - The ATOM project, - Building frontier model training teams in 6 months, - and other great topics. I really loved this one, and think you well too. Chapters: 00:00:00 Intro: Arcee AI, Trinity Models & Trinity Large 00:08:26 Transitioning a Company to Pre-training 00:13:00 Technical Decisions: Muon and MoE 00:18:41 Scaling and MoE Training Pain 00:23:14 Post-training and RL Strategies 00:28:09 Team Structure and Data Scaling 00:31:31 The Trinity Manifesto: US Open Weights 00:42:31 Specialized Models and Distillation 00:47:12 Infrastructure and Hosting 400B 00:50:53 Open Source as a Business Moat 00:56:31 Predictions: Best Model in 2026 01:02:29 Lightning Round & Conclusions More great open model builder podcasts coming soon!

26,872 views • 5 months ago

I re-recorded the post-training part of our NeurIPS tutorial on language models, added some more slides, and wrote up a mini state of the union on Interconnects. Enjoy! Links in QT. 00:00 Introduction 10:00 Prompts & Skill Selection 14:19 Instruction Finetuning 21:45 Preference Finetuning 36:17 Reinforcement Finetuning 45:28 Open Questions 52:02 Wrap Up

I re-recorded the post-training part of our NeurIPS tutorial on language models, added some more slides, and wrote up a mini state of the union on Interconnects. Enjoy! Links in QT. 00:00 Introduction 10:00 Prompts & Skill Selection 14:19 Instruction Finetuning 21:45 Preference Finetuning 36:17 Reinforcement Finetuning 45:28 Open Questions 52:02 Wrap Up

49,310 views • 1 year ago

Here’s to my crazy AI parties and models in 2026. Lot’s more to do the the AI Substack gang. We knew AI researchers deep down are fun. H/t to Outshift by Cisco Decibel and Lambda for making it happen at NeurIPS.

Here’s to my crazy AI parties and models in 2026. Lot’s more to do the the AI Substack gang. We knew AI researchers deep down are fun. H/t to Outshift by Cisco Decibel and Lambda for making it happen at NeurIPS.

20,725 views • 7 months ago

Another friday afternoon talk in attempt of open science. This is a re-record of a talk I gave at two Chantham House Rules workshops this fall: The History and Risks of Reinforcement Learning and Human Feedback (yes, the and is clever and intentional). This tries to broaden the discussion around RLHF to include more stakeholders and diverse ideas about how to integrate or measure values (can you at all?!?). I'm trying to re-record every talk that I give that isn't to a public audience 🫡😅, good practice at least. Slides: Youtube: Paper:

Another friday afternoon talk in attempt of open science. This is a re-record of a talk I gave at two Chantham House Rules workshops this fall: The History and Risks of Reinforcement Learning and Human Feedback (yes, the and is clever and intentional). This tries to broaden the discussion around RLHF to include more stakeholders and diverse ideas about how to integrate or measure values (can you at all?!?). I'm trying to re-record every talk that I give that isn't to a public audience 🫡😅, good practice at least. Slides: Youtube: Paper:

54,036 views • 2 years ago

Playing with Claude Computer Use is very worthwhile. It's obvious that its something that'll be used in the future, much like when you first try ChatGPT or amazing tech like AirPods. BUT, it's clear its integration will take some serious time. Here's an example web task, interrupted by: * Rate limits * Claude's propensity to refuse * Way more lag than we want We'll see this flywheel move very fast. I'm hoping open models iterating more quickly, and having less difficulty being deep in the operating system, and make this space move very fast. Like reasoning models following o1, this will be one of the main stories of 2025.

Playing with Claude Computer Use is very worthwhile. It's obvious that its something that'll be used in the future, much like when you first try ChatGPT or amazing tech like AirPods. BUT, it's clear its integration will take some serious time. Here's an example web task, interrupted by: * Rate limits * Claude's propensity to refuse * Way more lag than we want We'll see this flywheel move very fast. I'm hoping open models iterating more quickly, and having less difficulty being deep in the operating system, and make this space move very fast. Like reasoning models following o1, this will be one of the main stories of 2025.

12,654 views • 1 year ago

No more content to load