
Nathan Lambert
@natolambert • 86,343 subscribers
Research @allen_ai, reasoning, open models, RL(VR/HF)... Contact via email. Writes @interconnectsai, @readsail Wrote The RLHF Book, 🏔️🏃♂️
Shorts
Videos

Okay okay, spent my weekend gooning around learning GRPO math. Here's some takes. Essentially, this is me yapping through a recap of smaller details on how GRPO is implemented, what Dr. GRPO changes, why, DAPO, connections to PPO, aggregating batches... Reading list below.
Nathan Lambert123,030 views • 1 year ago

Here's a recent talk I gave recapping the last 6-12 months of AI progress, why getting perfect models is hard, how labs are likely approaching the next phase of training (for agents), and other interesting tidbits across the reasoning landscape. Topics: 00:00 Introduction & the state of reasoning 05:50 Hillclimbing imperfect evals 09:18 Technical bottlenecks 13:02 Sycophancy 18:08 The Goldilocks Zone 19:28 What comes next? (hint, planning) 26:40 Q&A YouTube etc in replies. Thanks Kyle Corbitt and OpenPipe for hosting me.
Nathan Lambert89,342 views • 11 months ago

Here's my conversation with Lucas Atkins and the team at Arcee.ai on their path to training and releasing Trinity Large today. From going all in on open models built end to end in the US 6 months ago to having the model in hand is no easy feet. I loved this conversation on how to design a startup around open models and take a bold step to scale it up. I'm openly an Arcee fan, watching them take risk and pull it off. We discuss: - The state (and future) of open vs. closed models, - The business of selling open models for on-prem deployments, - The story of Arcee AI & going “all-in” on this training run, - The ATOM project, - Building frontier model training teams in 6 months, - and other great topics. I really loved this one, and think you well too. Chapters: 00:00:00 Intro: Arcee AI, Trinity Models & Trinity Large 00:08:26 Transitioning a Company to Pre-training 00:13:00 Technical Decisions: Muon and MoE 00:18:41 Scaling and MoE Training Pain 00:23:14 Post-training and RL Strategies 00:28:09 Team Structure and Data Scaling 00:31:31 The Trinity Manifesto: US Open Weights 00:42:31 Specialized Models and Distillation 00:47:12 Infrastructure and Hosting 400B 00:50:53 Open Source as a Business Moat 00:56:31 Predictions: Best Model in 2026 01:02:29 Lightning Round & Conclusions More great open model builder podcasts coming soon!
Nathan Lambert26,872 views • 4 months ago

DPO Debate: Is RL needed for RLHF? All things as we cannot settle if DPO or RL is better. At least it is a good exercise. 1. Derivations in the DPO paper. Hint, the authors are good at math 2. cDPO, IPO, and related equations 3. Speculation on potential oddities of DPO vs RL 4. Reminders on the state of open RLHF tldr: we have more limitations with data and tooling and evaluation than optimizer choice Slides: Recent blog post of mine on DPO (more next Wed.): DPO Paper: On youtube:
Nathan Lambert100,016 views • 2 years ago

I re-recorded the post-training part of our NeurIPS tutorial on language models, added some more slides, and wrote up a mini state of the union on Interconnects. Enjoy! Links in QT. 00:00 Introduction 10:00 Prompts & Skill Selection 14:19 Instruction Finetuning 21:45 Preference Finetuning 36:17 Reinforcement Finetuning 45:28 Open Questions 52:02 Wrap Up
Nathan Lambert49,310 views • 1 year ago

Playing with Claude Computer Use is very worthwhile. It's obvious that its something that'll be used in the future, much like when you first try ChatGPT or amazing tech like AirPods. BUT, it's clear its integration will take some serious time. Here's an example web task, interrupted by: * Rate limits * Claude's propensity to refuse * Way more lag than we want We'll see this flywheel move very fast. I'm hoping open models iterating more quickly, and having less difficulty being deep in the operating system, and make this space move very fast. Like reasoning models following o1, this will be one of the main stories of 2025.
Nathan Lambert12,654 views • 1 year ago
No more content to load