Загрузка видео...

Не удалось загрузить видео

На главную

OpenAI recently released its first open-weights model since GPT-2, entering a field led by DeepSeek and Alibaba's Qwen. Ankit () breaks down these top OSS models, including what sets them apart under the hood: mixture-of-experts, long-context training, and post-training techniques that shape reasoning and alignment—and how different design choices...

208,680 просмотров • 10 месяцев назад •via X (Twitter)

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

*Major* open source AI drop today. Can America win the Open AI race? My conversation with Nathan Lambert and Luca Soldaini 🎀 of Ai2 about the launch of Olmo 3 00:00 – Cold Open 00:39 – Welcome & today’s big announcement 01:18 – Introducing the Olmo 3 model family 02:07 – What “base models” really are (and why they matter) 05:51 – Dolma 3: the data behind Olmo 3 08:06 – Performance vs Qwen, Gemma, DeepSeek 10:28 – What true open source means (and why it’s rare) 12:51 – Intermediate checkpoints, transparency, and why AI2 publishes everything 16:37 – Why Qwen is everywhere (including U.S. startups) 18:31 – Why Chinese labs go open source (and why U.S. labs don’t) 20:28 – Inside ATOM: the U.S. response to China’s model surge 22:13 – The rise of “thinking models” and inference-time scaling 35:58 – The full Olmo pipeline, explained simply 46:52 – Pre-training: data, scale, and avoiding catastrophic spikes 50:27 – Mid-training (tail patching) and avoiding test leakage 52:06 – Why long-context training matters 55:28 – SFT: building the foundation for reasoning 1:04:53 – Preference tuning & why DPO still works 1:10:51 – The hard part: RLVR, long reasoning chains, and infrastructure pain 1:13:59 – Why RL is so technically brutal 1:18:17 – Complexity tax vs AGI hype 1:21:58 – How everyone can contribute to the future of AI 1:27:26 – Closing thoughts

Matt Turck

37,482 просмотров • 7 месяцев назад

Thanksgiving-week treat: an epic conversation on Frontier AI with Lukasz Kaiser -co-author of “Attention Is All You Need” (Transformers) and leading research scientist at OpenAI working on GPT-5.1-era reasoning models. 00:00 – Cold open and intro 01:29 – “AI slowdown” vs a wild week of new frontier models 08:03 – Low-hanging fruit, infra, RL training and better data 11:39 – What is a reasoning model, in plain language 17:02 – Chain-of-thought and training the thinking process with RL 21:39 – Łukasz’s path: from logic and France to Google and Kurzweil 24:20 – Inside the Transformer story and what “attention” really means 28:42 – From Google Brain to OpenAI: culture, scale and GPUs 32:49 – What’s next for pre-training, GPUs and distillation 37:29 – Can we still understand these models? Circuits, sparsity and black boxes 39:42 – GPT-4 → GPT-5 → GPT-5.1: what actually changed 42:40 – Post-training, safety and teaching GPT-5.1 different tones 46:16 – How long should GPT-5.1 think? Reasoning tokens and jagged abilities 47:43 – The five-year-old’s dot puzzle that still breaks frontier models 52:22 – Generalization, child-like learning and whether reasoning is enough 53:48 – Beyond Transformers: ARC, LeCun’s ideas and multimodal bottlenecks 56:10 – GPT-5.1 Codex Max, long-running agents and compaction 1:00:06 – Will foundation models eat most apps? The translation analogy and trust 1:02:34 – What still needs to be solved, and where AI might go next

Matt Turck

167,926 просмотров • 7 месяцев назад