Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

A 7-million parameter model outperforming models a thousand times its size on tasks like ARC Prize. That's what recursive reasoning unlocks. In this episode of Decoded, YC's Ankit Gupta and Francois Chaubard break down two recent papers on recursive AI models, HRMs and TRMs, that are achieving state-of-the-art results... with a fraction of the parameters of today's largest models. They explain why standard LLMs hit a fundamental ceiling on certain reasoning tasks, how recursion at inference time gives small models the compute depth to break through it, and what happens when you combine these ideas with the power of large-scale foundation models. 00:35 - Model Foundations 01:15 - RNN Limits and LLM Contrast 02:36 - Reasoning Limits and Sorting Analogy 04:22 - HRM Paper Introduction 05:25 - HRM Architecture and Intuition 07:36 - HRM Results and Outer Loop 09:46 - TRM Paper Overview 11:20 - TRM Training and Fixed Point 13:30 - Detailed HRM Summary 20:46 - Comparing HRM and TRM 34:45 - Future Outlookshow more

Y Combinator

1,603,087 subscribers

127,011 Aufrufe • vor 1 Monat •via X (Twitter)

Bildung Nachrichten & Politik Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Apple's AI Reasoning Model Paper Explained Apple’s disruptive new AI reasoning model research paper, “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models” explained, and what it means for AGI.

Apple's AI Reasoning Model Paper Explained Apple’s disruptive new AI reasoning model research paper, “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models” explained, and what it means for AGI.

Harper Carroll

37,956 Aufrufe • vor 1 Jahr

$Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure. Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models. The kicker? The full model trains in roughly one day on a $1,000 budget. This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game. Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.$

Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure. Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models. The kicker? The full model trains in roughly one day on a $1,000 budget. This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game. Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.

Sapient Intelligence

509,990 Aufrufe • vor 1 Monat

Diffusion is the foundational ML framework behind state-of-the-art AI image and video generation, including Sora, Midjourney and Google Veo. In this episode of Decoded, Ankit Gupta sits down with Francois Chaubard to discuss how diffusion works, walk through a code sample, and explain why everyone training models should understand it. 00:00 Intro 00:33 What is diffusion? 02:50 What are applications of diffusion today? 04:06 Key innovations 07:01 Code examples 19:25 The "squint test" 22:27 Other areas diffusion is widely accessible 24:49 Outro

Diffusion is the foundational ML framework behind state-of-the-art AI image and video generation, including Sora, Midjourney and Google Veo. In this episode of Decoded, Ankit Gupta sits down with Francois Chaubard to discuss how diffusion works, walk through a code sample, and explain why everyone training models should understand it. 00:00 Intro 00:33 What is diffusion? 02:50 What are applications of diffusion today? 04:06 Key innovations 07:01 Code examples 19:25 The "squint test" 22:27 Other areas diffusion is widely accessible 24:49 Outro

Y Combinator

74,604 Aufrufe • vor 5 Monaten

ARC-AGI is redefining how to measure progress on the path to AGI - focusing on reasoning, generalization, and adaptability instead of memorization or scale. At NeurIPS 2025, YC's Diana sat down with ARC Prize President Greg Kamradt to find out why most AI benchmarks fail, how ARC-AGI reveals the limits of today’s models, and why measuring intelligence may be harder than building it. 00:11 — What ARC Prize is and why it exists 00:38 — François Chollet’s definition of AGI 01:48 — What ARC-AGI Actually Tests 02:25 — When LLMs Failed the ARC Benchmark 03:38 — ARC-AGI Becomes the Standard 04:49 — False Positives in AI Progress 06:06 — The Evolution of ARC-AGI 08:55 — Measuring Intelligence beyond just accuracy 10:25 — What happens if a model solves ARC-AGI?

ARC-AGI is redefining how to measure progress on the path to AGI - focusing on reasoning, generalization, and adaptability instead of memorization or scale. At NeurIPS 2025, YC's Diana sat down with ARC Prize President Greg Kamradt to find out why most AI benchmarks fail, how ARC-AGI reveals the limits of today’s models, and why measuring intelligence may be harder than building it. 00:11 — What ARC Prize is and why it exists 00:38 — François Chollet’s definition of AGI 01:48 — What ARC-AGI Actually Tests 02:25 — When LLMs Failed the ARC Benchmark 03:38 — ARC-AGI Becomes the Standard 04:49 — False Positives in AI Progress 06:06 — The Evolution of ARC-AGI 08:55 — Measuring Intelligence beyond just accuracy 10:25 — What happens if a model solves ARC-AGI?

Y Combinator

98,369 Aufrufe • vor 6 Monaten

In this benchmark deep-dive, Sapient’s founders William and Guan are joined by research team members Changling and Yasin to unpack HRM-Text’s performance across MATH, DROP, ARC-Challenge, and MMLU. 📊 Beyond the scores, they discuss what each benchmark measures, how HRM-Text compares with larger models, and why efficiency matters. Watch the full discussion to learn more about HRM-Text and Sapient’s leaner path toward general intelligence.

In this benchmark deep-dive, Sapient’s founders William and Guan are joined by research team members Changling and Yasin to unpack HRM-Text’s performance across MATH, DROP, ARC-Challenge, and MMLU. 📊 Beyond the scores, they discuss what each benchmark measures, how HRM-Text compares with larger models, and why efficiency matters. Watch the full discussion to learn more about HRM-Text and Sapient’s leaner path toward general intelligence.

Sapient Intelligence

240,557 Aufrufe • vor 1 Monat

OpenAI recently released its first open-weights model since GPT-2, entering a field led by DeepSeek and Alibaba's Qwen. Ankit () breaks down these top OSS models, including what sets them apart under the hood: mixture-of-experts, long-context training, and post-training techniques that shape reasoning and alignment—and how different design choices lead to surprisingly similar performance. 00:00 – OpenAI OSS Launch 01:00 – Comparing Open Source LLM Architectures 01:46 – GPT OSS Overview 02:37 – Under The Hood of GPT OSS 03:25 – Qwen-3 Architecture 04:17 – Qwen-3 Training 05:12 – Qwen-3 Post-Training 06:08 – Qwen-3 Reasoning & RL Innovations 06:52 – DeepSeek V3 Overview 07:40 – DeepSeek V3.1 Updates 08:39 – Attention Mechanism (MLA) 09:39 – Comparing Model Sizes 10:35 – Long Context Strategies 11:25 – Reflections on Methods 12:00 – Takeaways

OpenAI recently released its first open-weights model since GPT-2, entering a field led by DeepSeek and Alibaba's Qwen. Ankit () breaks down these top OSS models, including what sets them apart under the hood: mixture-of-experts, long-context training, and post-training techniques that shape reasoning and alignment—and how different design choices lead to surprisingly similar performance. 00:00 – OpenAI OSS Launch 01:00 – Comparing Open Source LLM Architectures 01:46 – GPT OSS Overview 02:37 – Under The Hood of GPT OSS 03:25 – Qwen-3 Architecture 04:17 – Qwen-3 Training 05:12 – Qwen-3 Post-Training 06:08 – Qwen-3 Reasoning & RL Innovations 06:52 – DeepSeek V3 Overview 07:40 – DeepSeek V3.1 Updates 08:39 – Attention Mechanism (MLA) 09:39 – Comparing Model Sizes 10:35 – Long Context Strategies 11:25 – Reflections on Methods 12:00 – Takeaways

Y Combinator

208,680 Aufrufe • vor 9 Monaten

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Santiago

164,162 Aufrufe • vor 1 Jahr

The Surprising Performance Drivers of HRM. A paper talk from Ndea AI researcher Konstantin Schürholt. We ran a series of ablation studies to determine what factors had the biggest impact on the Hierarchical Reasoning Model's performance on ARC-AGI.

The Surprising Performance Drivers of HRM. A paper talk from Ndea AI researcher Konstantin Schürholt. We ran a series of ablation studies to determine what factors had the biggest impact on the Hierarchical Reasoning Model's performance on ARC-AGI.

Ndea

29,217 Aufrufe • vor 10 Monaten

Why AI Progress Suddenly Feels Real - my conversation with Yann Dubois, who co-leads the Post-Training Frontiers team at OpenAI 00:00 - Intro 01:30 - Why recent AI progress feels like a step function 04:13 - Model reliability & the emotional rollercoaster of shipping GPT-5.5 07:33 - How OpenAI structures vertical and horizontal teams 09:49 - Improving model efficiency and test-time compute 12:32 - Yann's journey from Switzerland to OpenAI 15:37 - Reasoning in 2026: Real-world utility vs verifiable rewards 18:34 - GPT-5.5 Thinking vs Pro: Scaling test-time compute 20:09 - How reasoning models become more efficient 23:23 - Pre-training scaling and overcoming the data wall 27:03 - Multimodal data, synthetic data, and embodied AI 31:05 - Demystifying mid-training and post-training 37:21 - Does RL create new capabilities in AI? 38:53 - The challenges and frontier of scaling RL 43:09 - Is building AI models a craft or a strict science 48:21 - How AI models generalize across different domains 54:18 - How reinforcement learning cures AI hallucinations 56:04 - Negative generalization and conflicting instructions 58:05 - Can RL scale to law, medicine, and the broader economy? 1:00:19 - The evaluation bottleneck and Model as a Judge 1:04:21 - Continuous AI progress & continual learning 1:08:49 - Will foundation models eat the agent harness 1:11:23 - Why startups should focus on the last mile of AI

Why AI Progress Suddenly Feels Real - my conversation with Yann Dubois, who co-leads the Post-Training Frontiers team at OpenAI 00:00 - Intro 01:30 - Why recent AI progress feels like a step function 04:13 - Model reliability & the emotional rollercoaster of shipping GPT-5.5 07:33 - How OpenAI structures vertical and horizontal teams 09:49 - Improving model efficiency and test-time compute 12:32 - Yann's journey from Switzerland to OpenAI 15:37 - Reasoning in 2026: Real-world utility vs verifiable rewards 18:34 - GPT-5.5 Thinking vs Pro: Scaling test-time compute 20:09 - How reasoning models become more efficient 23:23 - Pre-training scaling and overcoming the data wall 27:03 - Multimodal data, synthetic data, and embodied AI 31:05 - Demystifying mid-training and post-training 37:21 - Does RL create new capabilities in AI? 38:53 - The challenges and frontier of scaling RL 43:09 - Is building AI models a craft or a strict science 48:21 - How AI models generalize across different domains 54:18 - How reinforcement learning cures AI hallucinations 56:04 - Negative generalization and conflicting instructions 58:05 - Can RL scale to law, medicine, and the broader economy? 1:00:19 - The evaluation bottleneck and Model as a Judge 1:04:21 - Continuous AI progress & continual learning 1:08:49 - Will foundation models eat the agent harness 1:11:23 - Why startups should focus on the last mile of AI

Matt Turck

99,865 Aufrufe • vor 1 Monat

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or common sense reasoning. We propose to equip diffusion models with enhanced reasoning capabilities by using off-the-shelf pretrained large language models (LLMs) in a novel two-stage generation process. First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with an image prompt, an LLM outputs a scene layout in the form of bounding boxes along with corresponding individual descriptions. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning. Additionally, our method naturally allows dialog-based scene specification and is able to handle prompts in a language that is not well-supported by the underlying diffusion model.

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or common sense reasoning. We propose to equip diffusion models with enhanced reasoning capabilities by using off-the-shelf pretrained large language models (LLMs) in a novel two-stage generation process. First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with an image prompt, an LLM outputs a scene layout in the form of bounding boxes along with corresponding individual descriptions. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning. Additionally, our method naturally allows dialog-based scene specification and is able to handle prompts in a language that is not well-supported by the underlying diffusion model.

AK

83,657 Aufrufe • vor 2 Jahren

Meta announces Movie Gen A Cast of Media Foundation Models We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user’s image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization, video editing, video-to-audio generation, and text-to-audio generation. Our largest video generation model is a 30B parameter transformer trained with a maximum context length of 73K video tokens, corresponding to a generated video of 16 seconds at 16 frames-per-second. We show multiple technical innovations and simplifications on the architecture, latent spaces, training objectives and recipes, data curation, evaluation protocols, parallelization techniques, and inference optimizations that allow us to reap the benefits of scaling pre-training data, model size, and training compute for training large scale media generation models. We hope this paper helps the research community to accelerate progress and innovation in media generation models

Meta announces Movie Gen A Cast of Media Foundation Models We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user’s image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization, video editing, video-to-audio generation, and text-to-audio generation. Our largest video generation model is a 30B parameter transformer trained with a maximum context length of 73K video tokens, corresponding to a generated video of 16 seconds at 16 frames-per-second. We show multiple technical innovations and simplifications on the architecture, latent spaces, training objectives and recipes, data curation, evaluation protocols, parallelization techniques, and inference optimizations that allow us to reap the benefits of scaling pre-training data, model size, and training compute for training large scale media generation models. We hope this paper helps the research community to accelerate progress and innovation in media generation models

AK

62,719 Aufrufe • vor 1 Jahr

Thanksgiving-week treat: an epic conversation on Frontier AI with Lukasz Kaiser -co-author of “Attention Is All You Need” (Transformers) and leading research scientist at OpenAI working on GPT-5.1-era reasoning models. 00:00 – Cold open and intro 01:29 – “AI slowdown” vs a wild week of new frontier models 08:03 – Low-hanging fruit, infra, RL training and better data 11:39 – What is a reasoning model, in plain language 17:02 – Chain-of-thought and training the thinking process with RL 21:39 – Łukasz’s path: from logic and France to Google and Kurzweil 24:20 – Inside the Transformer story and what “attention” really means 28:42 – From Google Brain to OpenAI: culture, scale and GPUs 32:49 – What’s next for pre-training, GPUs and distillation 37:29 – Can we still understand these models? Circuits, sparsity and black boxes 39:42 – GPT-4 → GPT-5 → GPT-5.1: what actually changed 42:40 – Post-training, safety and teaching GPT-5.1 different tones 46:16 – How long should GPT-5.1 think? Reasoning tokens and jagged abilities 47:43 – The five-year-old’s dot puzzle that still breaks frontier models 52:22 – Generalization, child-like learning and whether reasoning is enough 53:48 – Beyond Transformers: ARC, LeCun’s ideas and multimodal bottlenecks 56:10 – GPT-5.1 Codex Max, long-running agents and compaction 1:00:06 – Will foundation models eat most apps? The translation analogy and trust 1:02:34 – What still needs to be solved, and where AI might go next

Thanksgiving-week treat: an epic conversation on Frontier AI with Lukasz Kaiser -co-author of “Attention Is All You Need” (Transformers) and leading research scientist at OpenAI working on GPT-5.1-era reasoning models. 00:00 – Cold open and intro 01:29 – “AI slowdown” vs a wild week of new frontier models 08:03 – Low-hanging fruit, infra, RL training and better data 11:39 – What is a reasoning model, in plain language 17:02 – Chain-of-thought and training the thinking process with RL 21:39 – Łukasz’s path: from logic and France to Google and Kurzweil 24:20 – Inside the Transformer story and what “attention” really means 28:42 – From Google Brain to OpenAI: culture, scale and GPUs 32:49 – What’s next for pre-training, GPUs and distillation 37:29 – Can we still understand these models? Circuits, sparsity and black boxes 39:42 – GPT-4 → GPT-5 → GPT-5.1: what actually changed 42:40 – Post-training, safety and teaching GPT-5.1 different tones 46:16 – How long should GPT-5.1 think? Reasoning tokens and jagged abilities 47:43 – The five-year-old’s dot puzzle that still breaks frontier models 52:22 – Generalization, child-like learning and whether reasoning is enough 53:48 – Beyond Transformers: ARC, LeCun’s ideas and multimodal bottlenecks 56:10 – GPT-5.1 Codex Max, long-running agents and compaction 1:00:06 – Will foundation models eat most apps? The translation analogy and trust 1:02:34 – What still needs to be solved, and where AI might go next

Matt Turck

167,926 Aufrufe • vor 6 Monaten

A lot of people are calling Hermes Agent the end of OpenClaw. BRUH! It's not... Nous Research trains actual models and they built an agent around that expertise. The local model routing is solid, but the part that matters for your business is that your conversations become fine-tuning data. You can train a model on how you actually work. 00:00 The Problem with Local AI Models 00:25 Introduction to Nous Research 01:04 Cross-Platform Agent Capabilities 01:44 Deep Local Model Integration 02:30 Routing Tasks to Different Models 03:06 Conversations as Training Data 03:50 Hermes Agent vs. OpenClaw 04:15 Future Plans and Series Overview

A lot of people are calling Hermes Agent the end of OpenClaw. BRUH! It's not... Nous Research trains actual models and they built an agent around that expertise. The local model routing is solid, but the part that matters for your business is that your conversations become fine-tuning data. You can train a model on how you actually work. 00:00 The Problem with Local AI Models 00:25 Introduction to Nous Research 01:04 Cross-Platform Agent Capabilities 01:44 Deep Local Model Integration 02:30 Routing Tasks to Different Models 03:06 Conversations as Training Data 03:50 Hermes Agent vs. OpenClaw 04:15 Future Plans and Series Overview

Ray Fernando

42,398 Aufrufe • vor 2 Monaten

Robots, Small Models, and RL with DeepSeek Alumnus Zihan Wang — Manifold #86 Great conversation with Zihan Wang - on RAGEN ✈️ CVPR 25 🙂 Zihan Wang is an AI researcher at Northwestern University, where he works on vision-language models, robotics, and reinforcement learning. Previously, he interned at DeepSeek, contributing to projects like DeepSeek-V2. 01:13 - Zihan's Background, CS and AI Research in China 11:09 - DeepSeek; Human capital flow from PRC to US 16:07 - DeepSeek, Open Source and AI Research 31:52 - Model Size and Performance Constraints 33:01 - Data Bottleneck in Pre-trained Models 34:12 - Transformer Architecture and Scaling Laws 36:30 - Efficiency in Model Training 47:44 - Chain of Experts Architecture 1:01:06 - Future of AI and Robotics

Robots, Small Models, and RL with DeepSeek Alumnus Zihan Wang — Manifold #86 Great conversation with Zihan Wang - on RAGEN ✈️ CVPR 25 🙂 Zihan Wang is an AI researcher at Northwestern University, where he works on vision-language models, robotics, and reinforcement learning. Previously, he interned at DeepSeek, contributing to projects like DeepSeek-V2. 01:13 - Zihan's Background, CS and AI Research in China 11:09 - DeepSeek; Human capital flow from PRC to US 16:07 - DeepSeek, Open Source and AI Research 31:52 - Model Size and Performance Constraints 33:01 - Data Bottleneck in Pre-trained Models 34:12 - Transformer Architecture and Scaling Laws 36:30 - Efficiency in Model Training 47:44 - Chain of Experts Architecture 1:01:06 - Future of AI and Robotics

steve hsu

34,221 Aufrufe • vor 1 Jahr

$A crypto project actually trained a 72B parameter AI model from scratch using decentralized GPU compute. Not fine-tuned, not a wrapper: trained from zero. The model benchmarks competitively against Meta's LLaMA 3 on reasoning tasks, and the entire training run cost a fraction of what centralized labs spend. If decentralized compute can produce frontier-class models, the moat around OpenAI and Anthropic is thinner than people think.$

A crypto project actually trained a 72B parameter AI model from scratch using decentralized GPU compute. Not fine-tuned, not a wrapper: trained from zero. The model benchmarks competitively against Meta's LLaMA 3 on reasoning tasks, and the entire training run cost a fraction of what centralized labs spend. If decentralized compute can produce frontier-class models, the moat around OpenAI and Anthropic is thinner than people think.

VirtualBacon

40,543 Aufrufe • vor 2 Monaten

Today, we're joined by Yejin Choi, professor and senior fellow at Stanford University University in the Computer Science Department and Stanford UniversityHAI. In this conversation, we explore Yejin’s recent work on making small language models reason more effectively. We discuss how high-quality, diverse data plays a central role in closing the intelligence gap between small and large models, and how combining synthetic data generation, imitation learning, and reinforcement learning can unlock stronger reasoning capabilities in smaller models. Yejin explains the risks of homogeneity in model outputs and mode collapse highlighted in her “Artificial Hivemind” paper, and its impacts on human creativity and knowledge. We also discuss her team's novel approaches, including reinforcement learning as a pre-training objective, where models are incentivized to “think” before predicting the next token, and "Prismatic Synthesis," a gradient-based method for generating diverse synthetic math data while filtering overrepresented examples. Additionally, we cover the societal implications of AI and the concept of pluralistic alignment—ensuring AI reflects the diverse norms and values of humanity. Finally, Yejin shares her mission to democratize AI beyond large organizations and offers her predictions for the coming year. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:44 - "Snowball effect" in AI investments 06:58 - Approaches to smaller models 08:58 - Importance of “better data” 14:07 - Imitation learning 18:24 - Artificial Hivemind paper 25:25 - AI risks 27:50 - Spectrum tuning 28:53 - Future of AI on humanity 33:08 - Reasoning in small models 34:58 - Prismatic Synthesis 48:20 - Reinforcement as a Pretraining Objective 55:04 - Pluralistic alignment 1:03:30 - Predictions

Today, we're joined by Yejin Choi, professor and senior fellow at Stanford University University in the Computer Science Department and Stanford UniversityHAI. In this conversation, we explore Yejin’s recent work on making small language models reason more effectively. We discuss how high-quality, diverse data plays a central role in closing the intelligence gap between small and large models, and how combining synthetic data generation, imitation learning, and reinforcement learning can unlock stronger reasoning capabilities in smaller models. Yejin explains the risks of homogeneity in model outputs and mode collapse highlighted in her “Artificial Hivemind” paper, and its impacts on human creativity and knowledge. We also discuss her team's novel approaches, including reinforcement learning as a pre-training objective, where models are incentivized to “think” before predicting the next token, and "Prismatic Synthesis," a gradient-based method for generating diverse synthetic math data while filtering overrepresented examples. Additionally, we cover the societal implications of AI and the concept of pluralistic alignment—ensuring AI reflects the diverse norms and values of humanity. Finally, Yejin shares her mission to democratize AI beyond large organizations and offers her predictions for the coming year. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:44 - "Snowball effect" in AI investments 06:58 - Approaches to smaller models 08:58 - Importance of “better data” 14:07 - Imitation learning 18:24 - Artificial Hivemind paper 25:25 - AI risks 27:50 - Spectrum tuning 28:53 - Future of AI on humanity 33:08 - Reasoning in small models 34:58 - Prismatic Synthesis 48:20 - Reinforcement as a Pretraining Objective 55:04 - Pluralistic alignment 1:03:30 - Predictions

The TWIML AI Podcast

12,141 Aufrufe • vor 4 Monaten

Emad Mostaque says by next year, AI reasoning models like OpenAI's o1 and DeepSeek's R1 will run on smartphones and perform PhD-level tasks with 20 watts of electricity - equivalent to the human brain

Emad Mostaque says by next year, AI reasoning models like OpenAI's o1 and DeepSeek's R1 will run on smartphones and perform PhD-level tasks with 20 watts of electricity - equivalent to the human brain

Tsarathustra

157,090 Aufrufe • vor 1 Jahr

François Chollet (François Chollet) on the ARC Prize and how we get to AGI. At AI Startup School in San Francisco. 00:00 - The Falling Cost of Compute 00:57 - Deep-Learning’s Scaling Era & Benchmarks 01:59 - The ARC Benchmark 03:02 - The 2024 Shift to Test-Time Adaptation 05:01 - What Is Intelligence? 07:12 - Why Benchmarks Matter (and Mislead) 08:57 - ARC 1 Exposes Scaling Limits 10:58 - ARC 2: Compositional Reasoning Arrives 12:55 - Humans vs. Models on ARC2 14:58 - Previewing ARC3 & Interactive Agency 17:00 - Kaleidoscopic Hypothesis and Abstractions 22:00 - Type 1 vs. Type 2 Abstractions 26:00 - Discrete Program Search & Inventive AI 29:00 - Fusing Intuition with Symbolic Reasoning 32:00 - Building AGI Through Meta-Learning Systems

François Chollet (François Chollet) on the ARC Prize and how we get to AGI. At AI Startup School in San Francisco. 00:00 - The Falling Cost of Compute 00:57 - Deep-Learning’s Scaling Era & Benchmarks 01:59 - The ARC Benchmark 03:02 - The 2024 Shift to Test-Time Adaptation 05:01 - What Is Intelligence? 07:12 - Why Benchmarks Matter (and Mislead) 08:57 - ARC 1 Exposes Scaling Limits 10:58 - ARC 2: Compositional Reasoning Arrives 12:55 - Humans vs. Models on ARC2 14:58 - Previewing ARC3 & Interactive Agency 17:00 - Kaleidoscopic Hypothesis and Abstractions 22:00 - Type 1 vs. Type 2 Abstractions 26:00 - Discrete Program Search & Inventive AI 29:00 - Fusing Intuition with Symbolic Reasoning 32:00 - Building AGI Through Meta-Learning Systems

Y Combinator

231,725 Aufrufe • vor 11 Monaten

🔥 Battle for the top reasoning LLM intensifies! The QwQ-32B-Preview is a very good reasoning LLM. Full video of my tests here: Summary of my findings and thoughts: It was able to solve a couple of hard math problems so it looks very promising for maths. It didn’t do so well on my coding task (generating bash script). By the results reported on the LiveCodeBench it has room for improvement. One thing that’s become very clear to me is that the reasoning capabilities of these LLMs are significantly closing the gap between the open and closed-sourced models. The competition is now going to be on a different level and it's going to be focused on which model produces the most efficient, optimized, accurate, and fastest reasoning steps beyond just accurate responses. That's what developers will care about. Traditional benchmarks are not going to be good enough for this. On that note, it's getting harder to assess these models, especially the consistency, efficiency, and quality of reasoning steps. After experimenting with this model, I realized that the reasoning paths are not fully optimized and there is a lot more optimization that needs to happen before these models are used in production settings. There might be a need to build some type of native and efficient self-assessment or self-reflection capability that prevents these reasoning LLMs to go in loops or produce unnecessary lengthy sequences. I also noticed that this model, at least from the HF demo, doesn’t separate the reasoning from the response. I think that actually hurts the performance of the model. On the other hand, o1 and R1 do that really well. In addition to that, I believe the training on reasoning is hurting the performance of the LLM in other areas such as helpfulness (check the code example in the video). Something that’s necessary at the moment is validating or evaluating the quality of the reasoning chains and figuring out a better strategy to optimize them. Current methods are probably not sufficient to solve this problem but that's where innovation will comes next. I recognize that this is a first effort so kudos to the Qwen team on this release. These issues highlight the importance of transparency with reasoning LLMs. We need to know how it was trained and with exact data or optimization strategy. Understanding that will enable researchers and developers to build better intuition and improve the reasoning capabilities and components at a faster rate. There is an opportunity for someone or a company to build a truly open-reasoning LLM. The race is on! I will continue to track the state-of-the-art in reasoning LLMs and report my takes and observations here. Stay tuned for more.

🔥 Battle for the top reasoning LLM intensifies! The QwQ-32B-Preview is a very good reasoning LLM. Full video of my tests here: Summary of my findings and thoughts: It was able to solve a couple of hard math problems so it looks very promising for maths. It didn’t do so well on my coding task (generating bash script). By the results reported on the LiveCodeBench it has room for improvement. One thing that’s become very clear to me is that the reasoning capabilities of these LLMs are significantly closing the gap between the open and closed-sourced models. The competition is now going to be on a different level and it's going to be focused on which model produces the most efficient, optimized, accurate, and fastest reasoning steps beyond just accurate responses. That's what developers will care about. Traditional benchmarks are not going to be good enough for this. On that note, it's getting harder to assess these models, especially the consistency, efficiency, and quality of reasoning steps. After experimenting with this model, I realized that the reasoning paths are not fully optimized and there is a lot more optimization that needs to happen before these models are used in production settings. There might be a need to build some type of native and efficient self-assessment or self-reflection capability that prevents these reasoning LLMs to go in loops or produce unnecessary lengthy sequences. I also noticed that this model, at least from the HF demo, doesn’t separate the reasoning from the response. I think that actually hurts the performance of the model. On the other hand, o1 and R1 do that really well. In addition to that, I believe the training on reasoning is hurting the performance of the LLM in other areas such as helpfulness (check the code example in the video). Something that’s necessary at the moment is validating or evaluating the quality of the reasoning chains and figuring out a better strategy to optimize them. Current methods are probably not sufficient to solve this problem but that's where innovation will comes next. I recognize that this is a first effort so kudos to the Qwen team on this release. These issues highlight the importance of transparency with reasoning LLMs. We need to know how it was trained and with exact data or optimization strategy. Understanding that will enable researchers and developers to build better intuition and improve the reasoning capabilities and components at a faster rate. There is an opportunity for someone or a company to build a truly open-reasoning LLM. The race is on! I will continue to track the state-of-the-art in reasoning LLMs and report my takes and observations here. Stay tuned for more.

elvis

14,740 Aufrufe • vor 1 Jahr