正在加载视频...

视频加载失败

New Benchtalks with John Yang: on ProgramBench (0% frontier models at launch) and the lineage/future of coding benchmarks, from SWE-bench/InterCode to now 01:29 ProgramBench launch and reception 03:41 Why artifact-level evaluation, not code-level 06:03 Why models love Python 08:29 ProgramBench as a research tool 12:45 From SWE-bench & InterCode...

26,170 次观看 • 22 天前 •via X (Twitter)

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

François Chollet (François Chollet) has spent years asking a different question than most of the AI world. Instead of scaling what already works, he’s trying to understand what intelligence actually is and how to build it from first principles. In this episode of the Lightcone Podcast, he traces that path from his early work on deep learning to the creation of the ARC Prize, and the launch of ARC V3, a new benchmark designed to measure something deeper than performance: the ability to learn, adapt, and reason efficiently in entirely new environments. He explains why today’s systems may be hitting limits, what recent breakthroughs really mean, and why reaching true general intelligence may require a fundamentally different approach. 00:00 - AGI by 2030? 00:31 - Introducing Ndea: A New Path Beyond Deep Learning 01:08 - A New ML Paradigm 01:30 - Replacing neural nets with compact symbolic programs 03:04 - Why Ndea Isn’t Competing With Coding Agents 05:20 - Why Everyone Might Be Wrong About Scaling LLMs 07:22 - Why Coding Agents Suddenly Work So Well 08:50 - The Limits of LLMs in Non-Verifiable Domains 10:48 - What AGI Actually Means (And Why Most Definitions Are Wrong) 13:30 - Why Deep Learning Hits a Wall 14:00 - ARC’s Origin Story 18:20 - ARC Benchmarks Explained: From V1 to V3 22:49 - The RL Loop Powering Coding Agents Today 27:03 - ARC-AGI V3: Measuring “Agentic Intelligence” 31:14 - Inside the ARC Game Studio 35:31 - Could AGI Fit in 10,000 Lines of Code? 44:01 - Building Ndea: From Idea to Compounding Research Stack 46:46 - The Future of ARC: Benchmarks That Evolve With AI 47:21 - Why There’s Still Huge Opportunity for New AI Paradigms 53:37 - How to Build a Breakout Open Source Project - Lessons From Keras 56:39 - Advice For How To Think About AI

Y Combinator

151,141 次观看 • 3 个月前

Why AI Can Now Make Discoveries - my conversation with Dan Roberts, Lead of the Foundations of Reinforcement Learning team at OpenAI 00:00 Intro: AI's wild week in mathematics 01:21 What OpenAI's Foundations of RL team does 03:08 Dan's journey: from black holes and quantum gravity to frontier AI 07:04 Are AI systems becoming useful for real science 08:21 The AI math moment: Erdős, OpenAI, DeepMind, and Anthropic 08:52 Why the OpenAI result was an act of exploration 10:25 OpenAI vs. DeepMind: informal reasoning vs. formal proof 12:13 RL 101: learning by doing, not just watching 15:10 Why reinforcement learning works 15:58 How RL breaks: sparse feedback and long-horizon tasks 17:03 RLHF: how human feedback shaped early language models 18:48 Move 37, self-play, and the search for novel strategies 22:16 Explore vs. exploit in scientific discovery 24:49 Why RL may now be "the cake," not the cherry on top 25:46 Why RL started working with large language models 27:29 Is RL "sucking supervision through a straw"? 28:47 Why language may be the grounding layer for intelligence 31:46 A contrarian take on the Bitter Lesson 32:41 What test-time compute actually is 34:50 How RL gives models the ability to think 35:40 Verifiable rewards, math, coding, and the messy real world 38:00 What physics can teach us about AI 42:08 Is there a thermodynamics of AI? 43:08 From Erdős problems to Einstein-level AI 45:16 Is AI already doing original science? 45:51 How far are we from AI automating AI research 47:41 Why Dan is excited about the future of science

Matt Turck

64,094 次观看 • 21 天前

Thanksgiving-week treat: an epic conversation on Frontier AI with Lukasz Kaiser -co-author of “Attention Is All You Need” (Transformers) and leading research scientist at OpenAI working on GPT-5.1-era reasoning models. 00:00 – Cold open and intro 01:29 – “AI slowdown” vs a wild week of new frontier models 08:03 – Low-hanging fruit, infra, RL training and better data 11:39 – What is a reasoning model, in plain language 17:02 – Chain-of-thought and training the thinking process with RL 21:39 – Łukasz’s path: from logic and France to Google and Kurzweil 24:20 – Inside the Transformer story and what “attention” really means 28:42 – From Google Brain to OpenAI: culture, scale and GPUs 32:49 – What’s next for pre-training, GPUs and distillation 37:29 – Can we still understand these models? Circuits, sparsity and black boxes 39:42 – GPT-4 → GPT-5 → GPT-5.1: what actually changed 42:40 – Post-training, safety and teaching GPT-5.1 different tones 46:16 – How long should GPT-5.1 think? Reasoning tokens and jagged abilities 47:43 – The five-year-old’s dot puzzle that still breaks frontier models 52:22 – Generalization, child-like learning and whether reasoning is enough 53:48 – Beyond Transformers: ARC, LeCun’s ideas and multimodal bottlenecks 56:10 – GPT-5.1 Codex Max, long-running agents and compaction 1:00:06 – Will foundation models eat most apps? The translation analogy and trust 1:02:34 – What still needs to be solved, and where AI might go next

Matt Turck

167,926 次观看 • 7 个月前

I asked Dan Martell to walk me through every level of making money with AI. He gave me the most simple, practical advice I've ever heard on this subject. Level 1 - Making $0 - $100k Level 2 - Making $1m - $10m Level 3 - Building a $10m++ enterprise. 0:00 Only 5% of the World Has Ever Paid for AI 0:46 The Easiest Thing to Sell With AI Right Now 1:56 The Marcus and Sophie Framework 4:24 Theory of Constraints (Right Problem to Solve) 5:33 What Is the Number One Business Constraint 7:13 How to Leave Your Job and Go All In 8:27 Business Is Simple Find a Problem and Solve It 9:08 Stop Getting Ready to Get Ready 9:33 The Sarah Story One Text and $10K 9:53 Pull Up Your Phone and Message Your Contacts 11:05 Dan's Son Gets His First Client at $800/Month 12:41 Best Employee vs. Best Employer 13:59 What Other Services Can You Sell With AI 14:44 Sales Is Not Talking It's Asking 17:01 What to Do When You Hate Your Business 18:40 Pain and Pleasure Are the Only Two Motivators 19:13 They Haven't Made It a Must Yet 20:29 Make It a Must Not a Nice to Have 21:06 The Jen Story and the Gasping Moment 22:17 How to Find Your First 10 to 15 Clients 28:38 The Personal Brand Play 33:06 Vision Is What AI Cannot Do 34:55 Hard for Computers Easy for Humans 36:13 Level 2 Making Your First Million With AI 37:18 The Replacement Ladder Framework 37:39 Admin First Then Delivery Then Marketing 39:09 Why Marketing Is the Biggest AI Category 39:32 Why You Should Keep Sales for Yourself 40:00 Level 5 Leadership and AI Agents 41:41 What a Fully AI Systems Business Looks Like 43:13 The Gym Owner With Three Locations 46:16 Shutting Down the Company for Two Days 46:37 Teaching the Whole Team to Code in Claude 49:28 Wayne the 62 Year Old Who Made $12K a Month 52:38 I Only Share What Actually Works 53:21 Whisper Flow and Talking to Your AI 56:41 Claude Chat Claude Coworker and Claude Code 57:57 The Claude Browser Extension 58:49 Claude Code Is Not Just for Developers 1:00:06 How to Migrate Your AI Memory Across Tools 1:01:08 Level 3 $1M to $10M and the Brand Play 1:02:05 Nobody Buys AI They Buy Trust 1:03:25 Brand Is Association and Association Is Trust 1:05:12 A Million Followers Is $10M in Activated Revenue 1:07:03 How to Keep AI From Becoming Slop 1:07:42 Human in the Loop 1:08:16 The 10 80 10 Rule and Why AI Is Now the 80 1:10:01 The Team FIRED Themselves 1:11:45 Dan's Free AI Curriculum for Your Team

Grant

114,770 次观看 • 1 天前