Latent.Space's banner

Latent.Space

@latentspacepod • 28,449 subscribers

The #1 AI Engineering podcast & newsletter, now covering AI for Science as well. Over 170,000 daily readers. Technical news today you will use at work tomorrow!

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

🆕 Marc Andreessen’s 2026 AI Thesis: Agents, Open Source, and Why This Time Is Different Marc Andreessen 🇺🇸 of a16z says AI people keep swinging between utopian and apocalyptic for one simple reason: this field has been “almost here” for 80 years. But now, the breakthroughs are no longer theoretical. Reasoning, coding, agents, and self-improvement are all starting to work at once. This episode goes deep on AI winters, OpenAI + OpenClaw, infrastructure overbuild risk, proof-of-human, why software may soon be written mostly for bots, and why the real bottleneck may be society adopting AI rather than the models improving.

🆕 Marc Andreessen’s 2026 AI Thesis: Agents, Open Source, and Why This Time Is Different Marc Andreessen 🇺🇸 of a16z says AI people keep swinging between utopian and apocalyptic for one simple reason: this field has been “almost here” for 80 years. But now, the breakthroughs are no longer theoretical. Reasoning, coding, agents, and self-improvement are all starting to work at once. This episode goes deep on AI winters, OpenAI + OpenClaw, infrastructure overbuild risk, proof-of-human, why software may soon be written mostly for bots, and why the real bottleneck may be society adopting AI rather than the models improving.

351,146 просмотров • 2 месяцев назад

From rewriting Google’s search stack in the early 2000s to reviving sparse trillion-parameter models and co-designing TPUs with frontier ML research, Jeff Dean has quietly shaped nearly every layer of the modern AI stack. As Chief AI Scientist at Google and a driving force behind Gemini, Jeff has lived through multiple scaling revolutions from CPUs and sharded indices to multimodal models that reason across text, video, and code. We sat down with Jeff to unpack what it really means to “own the Pareto frontier,” why distillation is the quiet force behind every generation of faster, cheaper models, how energy not FLOPs is becoming the true constraint on AI compute, what it takes to co-design hardware and models 2–6 years into the future, why unified multimodal systems will outperform specialized ones, what it was like leading the charge to unify all of Google’s AI teams, and his prediction that deeply personalized models with access to your full digital context will redefine what useful AI looks like. Jeff Dean Google DeepMind Google

From rewriting Google’s search stack in the early 2000s to reviving sparse trillion-parameter models and co-designing TPUs with frontier ML research, Jeff Dean has quietly shaped nearly every layer of the modern AI stack. As Chief AI Scientist at Google and a driving force behind Gemini, Jeff has lived through multiple scaling revolutions from CPUs and sharded indices to multimodal models that reason across text, video, and code. We sat down with Jeff to unpack what it really means to “own the Pareto frontier,” why distillation is the quiet force behind every generation of faster, cheaper models, how energy not FLOPs is becoming the true constraint on AI compute, what it takes to co-design hardware and models 2–6 years into the future, why unified multimodal systems will outperform specialized ones, what it was like leading the charge to unify all of Google’s AI teams, and his prediction that deeply personalized models with access to your full digital context will redefine what useful AI looks like. Jeff Dean Google DeepMind Google

528,645 просмотров • 4 месяцев назад

From early engineer at Glean to Partner at Menlo Ventures co-leading the Anthology Fund with Anthropic, Deedy Das has gone from shipping “boring” enterprise search to backing frontier labs, infra, and research plays like Anthropic, OpenRouter, Goodfire, Prime Intellect, and Whisper. We sat down with Deedy to unpack how Glean quietly built a real AI moat before LLMs were cool, why enterprise search is significantly different than consumer search, where value actually accrues in the model vs app stack, Anthropic’s rise and the importance of Claude Code, the Anthology Fund thesis, OpenRouter’s wedge, mechanistic interpretability, the coming compute arms race, coding agents and “LLM psychosis,” and what all of this means for AI engineers deciding what to build. Deedy swyx 🔜 NeurIPS + #DevWritersRetreat Alessio Fanelli

From early engineer at Glean to Partner at Menlo Ventures co-leading the Anthology Fund with Anthropic, Deedy Das has gone from shipping “boring” enterprise search to backing frontier labs, infra, and research plays like Anthropic, OpenRouter, Goodfire, Prime Intellect, and Whisper. We sat down with Deedy to unpack how Glean quietly built a real AI moat before LLMs were cool, why enterprise search is significantly different than consumer search, where value actually accrues in the model vs app stack, Anthropic’s rise and the importance of Claude Code, the Anthology Fund thesis, OpenRouter’s wedge, mechanistic interpretability, the coming compute arms race, coding agents and “LLM psychosis,” and what all of this means for AI engineers deciding what to build. Deedy swyx 🔜 NeurIPS + #DevWritersRetreat Alessio Fanelli

269,074 просмотров • 7 месяцев назад

"Projects like the New Deal, the Apollo program pale in comparison to what we're doing right now." 🆕 Greg Brockman (Greg Brockman) joins us to talk GPT-5, GPT-OSS, and what's next on OpenAI's road to crystallizing all of human intelligence! “Energy turns into compute, turns into intelligence… crystallizing compute into potential energy you can release again and again.” 0:00:04 - Introductions 0:01:04 - The Evolution of Reasoning at OpenAI 0:04:01 - Online vs Offline Learning in Language Models 0:06:44 - Sample Efficiency and Human Curation in Reinforcement Learning 0:08:16 - Scaling Compute and Supercritical Learning 0:13:21 - Wall clock time limitations in RL and real-world interactions 0:16:34 - Experience with ARC Institute and DNA neural networks 0:19:33 - Defining the GPT-5 Era 0:22:46 - Evaluating Model Intelligence and Task Difficulty 0:25:06 - Practical Advice for Developers Using GPT-5 0:31:48 - Model Specs 0:37:21 - Challenges in RL Preferences (e.g., try/catch) 0:39:13 - Model Routing and Hybrid Architectures in GPT-5 0:43:58 - GPT-5 pricing and compute efficiency improvements 0:46:04 - Self-Improving Coding Agents and Tool Usage 0:49:11 - On-Device Models and Local vs Remote Agent Systems 0:51:34 - Engineering at OpenAI and Leveraging LLMs 0:54:16 - Structuring Codebases and Teams for AI Optimization 0:55:27 - The Value of Engineers in the Age of AGI 0:58:42 - Current state of AI research and lab diversity 1:01:11 - OpenAI’s Prioritization and Focus Areas 1:03:05 - Advice for Founders - It's Not Too Late 1:04:20 - Future outlook and closing thoughts 1:04:33 - Time Capsule to 2045 - Future of Compute and Abundance 1:07:07 - Time Capsule to 2005 - More Problems Will Emerge

"Projects like the New Deal, the Apollo program pale in comparison to what we're doing right now." 🆕 Greg Brockman (Greg Brockman) joins us to talk GPT-5, GPT-OSS, and what's next on OpenAI's road to crystallizing all of human intelligence! “Energy turns into compute, turns into intelligence… crystallizing compute into potential energy you can release again and again.” 0:00:04 - Introductions 0:01:04 - The Evolution of Reasoning at OpenAI 0:04:01 - Online vs Offline Learning in Language Models 0:06:44 - Sample Efficiency and Human Curation in Reinforcement Learning 0:08:16 - Scaling Compute and Supercritical Learning 0:13:21 - Wall clock time limitations in RL and real-world interactions 0:16:34 - Experience with ARC Institute and DNA neural networks 0:19:33 - Defining the GPT-5 Era 0:22:46 - Evaluating Model Intelligence and Task Difficulty 0:25:06 - Practical Advice for Developers Using GPT-5 0:31:48 - Model Specs 0:37:21 - Challenges in RL Preferences (e.g., try/catch) 0:39:13 - Model Routing and Hybrid Architectures in GPT-5 0:43:58 - GPT-5 pricing and compute efficiency improvements 0:46:04 - Self-Improving Coding Agents and Tool Usage 0:49:11 - On-Device Models and Local vs Remote Agent Systems 0:51:34 - Engineering at OpenAI and Leveraging LLMs 0:54:16 - Structuring Codebases and Teams for AI Optimization 0:55:27 - The Value of Engineers in the Age of AGI 0:58:42 - Current state of AI research and lab diversity 1:01:11 - OpenAI’s Prioritization and Focus Areas 1:03:05 - Advice for Founders - It's Not Too Late 1:04:20 - Future outlook and closing thoughts 1:04:33 - Time Capsule to 2045 - Future of Compute and Abundance 1:07:07 - Time Capsule to 2005 - More Problems Will Emerge

305,090 просмотров • 10 месяцев назад

Abridge: 100M+ medical conversations, real-time prior auth, and the clinical intelligence layer Abridge is building the clinical intelligence layer for healthcare. In this episode, Janie Lee and Chaitanya Asawa explain why ambient documentation was only the first wedge, how Abridge is turning patient conversations into real-time clinical decision support, why healthcare may become one of AI’s most important proving grounds, and how 100M+ medical conversations, specialty-specific evals, and deep EHR integrations create a moat for AI-native healthcare.

Abridge: 100M+ medical conversations, real-time prior auth, and the clinical intelligence layer Abridge is building the clinical intelligence layer for healthcare. In this episode, Janie Lee and Chaitanya Asawa explain why ambient documentation was only the first wedge, how Abridge is turning patient conversations into real-time clinical decision support, why healthcare may become one of AI’s most important proving grounds, and how 100M+ medical conversations, specialty-specific evals, and deep EHR integrations create a moat for AI-native healthcare.

41,871 просмотров • 1 месяц назад

🆕The Age of Async Agents: Devin’s 7x PR growth, 80% AI commits, background agents, memory, testing, & Open-Inspect Cognition cofounder + CPO Walden and Open-Inspect creator explain why engineering is moving from local IDEs to cloud background agents, how Devin went from 16% to 80% of commits across Cognition repos, why spec-to-PR workflows became real after the December model inflection, why testing is harder than computer use, how Devin separates the brain from the machine, why MCP is not enough for production agent integrations, and how PMs, support teams, and SRE workflows are starting to turn Slack messages into pull requests.

🆕The Age of Async Agents: Devin’s 7x PR growth, 80% AI commits, background agents, memory, testing, & Open-Inspect Cognition cofounder + CPO Walden and Open-Inspect creator explain why engineering is moving from local IDEs to cloud background agents, how Devin went from 16% to 80% of commits across Cognition repos, why spec-to-PR workflows became real after the December model inflection, why testing is harder than computer use, how Devin separates the brain from the machine, why MCP is not enough for production agent integrations, and how PMs, support teams, and SRE workflows are starting to turn Slack messages into pull requests.

24,685 просмотров • 25 дней назад

Priscilla Chan and Mark Zuckerberg co-founded the Chan Zuckerberg Initiative (CZI) in 2015, committing 99% of their Meta shares to advance science, education, and opportunity. As a pediatrician and CEO of Meta respectively, they've built CZI into one of the most ambitious technology-driven philanthropic organizations with a moonshot goal to help cure, prevent, or manage all diseases by the end of the century. In this episode, we sit down with Priscilla and Mark to unpack how CZI bridges state-of-the-art AI with open biomedical research, their investments in open-source tools like the Human Cell Atlas, their philosophy on building teams across different disciplines, and what they envision the future of healthcare. Chan Zuckerberg Initiative swyx Alessio Fanelli

Priscilla Chan and Mark Zuckerberg co-founded the Chan Zuckerberg Initiative (CZI) in 2015, committing 99% of their Meta shares to advance science, education, and opportunity. As a pediatrician and CEO of Meta respectively, they've built CZI into one of the most ambitious technology-driven philanthropic organizations with a moonshot goal to help cure, prevent, or manage all diseases by the end of the century. In this episode, we sit down with Priscilla and Mark to unpack how CZI bridges state-of-the-art AI with open biomedical research, their investments in open-source tools like the Human Cell Atlas, their philosophy on building teams across different disciplines, and what they envision the future of healthcare. Chan Zuckerberg Initiative swyx Alessio Fanelli

166,190 просмотров • 7 месяцев назад

From applied cryptography and offensive security in France’s defense industry to optimizing nuclear submarine workflows, then selling his e-signature startup to Docusign and now running AI as CTO of Superhuman Mail (Superhuman, recently acquired by Grammarly), Loïc Houssier has lived the full arc from deep infra and compliance hell to obsessing over 100ms product experiences and AI-native email. We sat down with Loïc to dig into how you actually put AI into an inbox without adding latency, why Superhuman leans so hard into agentic search and “Ask AI” over your entire email history, how they design tools vs. agents and fight agent laziness, what box-priced inference and local-first caching mean for cost and reliability, and his bet that your inbox will power your future AI EA while AI massively widens the gap between engineers with real fundamentals and those faking it. Ordinary Extrare Superhuman swyx Alessio Fanelli

From applied cryptography and offensive security in France’s defense industry to optimizing nuclear submarine workflows, then selling his e-signature startup to Docusign and now running AI as CTO of Superhuman Mail (Superhuman, recently acquired by Grammarly), Loïc Houssier has lived the full arc from deep infra and compliance hell to obsessing over 100ms product experiences and AI-native email. We sat down with Loïc to dig into how you actually put AI into an inbox without adding latency, why Superhuman leans so hard into agentic search and “Ask AI” over your entire email history, how they design tools vs. agents and fight agent laziness, what box-priced inference and local-first caching mean for cost and reliability, and his bet that your inbox will power your future AI EA while AI massively widens the gap between engineers with real fundamentals and those faking it. Ordinary Extrare Superhuman swyx Alessio Fanelli

126,794 просмотров • 6 месяцев назад

Noam Brown from OpenAI just dropped a truth bomb:⁣ ⁣ "Your fancy AI scaffolds will be washed away by scale"⁣ ⁣ Routers, harnesses, complex agentic systems... all getting replaced by models that just work better out of the box⁣ ⁣ The reasoning models already proved this

Noam Brown from OpenAI just dropped a truth bomb:⁣ ⁣ "Your fancy AI scaffolds will be washed away by scale"⁣ ⁣ Routers, harnesses, complex agentic systems... all getting replaced by models that just work better out of the box⁣ ⁣ The reasoning models already proved this

208,301 просмотров • 11 месяцев назад

🆕 Claude Cowork, Skills, and the Future of AI Coworkers Felix Rieseberg has spent years working at the interface layer, from Electron and the Slack desktop app to now helping build Claude Cowork. In this episode, Felix explains why execution is getting so cheap that teams can “build all the candidates,” why Anthropic is betting on local-first agent workflows, and why the future of AI products may belong less to chatbots and more to systems that can actually do knowledge work.

🆕 Claude Cowork, Skills, and the Future of AI Coworkers Felix Rieseberg has spent years working at the interface layer, from Electron and the Slack desktop app to now helping build Claude Cowork. In this episode, Felix explains why execution is getting so cheap that teams can “build all the candidates,” why Anthropic is betting on local-first agent workflows, and why the future of AI products may belong less to chatbots and more to systems that can actually do knowledge work.

65,812 просмотров • 3 месяцев назад

For our first episode of our new show In-Context Cooking, we have the Founder & CEO of SemiAnalysis Dylan Patel. We talk about: • Taiwan endgame scenarios & TSMC risk • AI export controls + Chinese talent flight • $180–200B hyperscaler capex (is this a bubble?) • Nvidia vs vertical integration • What actually bottlenecks AI (power? fabs? chips?) • Why the public might turn anti-AI Dylan Patel SemiAnalysis allen

For our first episode of our new show In-Context Cooking, we have the Founder & CEO of SemiAnalysis Dylan Patel. We talk about: • Taiwan endgame scenarios & TSMC risk • AI export controls + Chinese talent flight • $180–200B hyperscaler capex (is this a bubble?) • Nvidia vs vertical integration • What actually bottlenecks AI (power? fabs? chips?) • Why the public might turn anti-AI Dylan Patel SemiAnalysis allen

66,930 просмотров • 3 месяцев назад

Andon Labs' Real-World AI Evals: Claude calls the FBI, AI CEOs, price cartels, Butter-Bench, & Luna Andon Labs cofounders Lukas Petersson and Axel Backlund explain why dollar-denominated evals reveal what traditional benchmarks miss, how Claude ended up reporting a $2/day vending machine fee to the FBI, why long-horizon agents spiral in weird ways, what happens when agents lie, form price cartels, and compete with each other, and why the future of AI safety may depend on testing models in messy real-world environments instead of clean benchmark sandboxes.

Andon Labs' Real-World AI Evals: Claude calls the FBI, AI CEOs, price cartels, Butter-Bench, & Luna Andon Labs cofounders Lukas Petersson and Axel Backlund explain why dollar-denominated evals reveal what traditional benchmarks miss, how Claude ended up reporting a $2/day vending machine fee to the FBI, why long-horizon agents spiral in weird ways, what happens when agents lie, form price cartels, and compete with each other, and why the future of AI safety may depend on testing models in messy real-world environments instead of clean benchmark sandboxes.

12,957 просмотров • 17 дней назад

From scaling startups in sales and public cloud (including building Palo Alto Networks’ central U.S. cloud business) to joining Kleiner Perkins to help technical founders turn product edge into repeatable revenue, Joubin Mirzadegan built a career around one obsession: distribution. That obsession led him to start the podcast Grit as a hiring wedge, work alongside breakout companies like Glean and Windsurf, and now incubate Roadrunner which is an AI-native rethink of CPQ as SaaS pricing explodes from “seats” into consumption, bundles, renewals, and SKU sprawl. We sat down with Joubin to dig into the behind-the-scenes craft of great interviews (i.e. why he never sends questions), what Windsurf got right about “Google-class product + Salesforce-class distribution,” how to hire early sales leaders without getting fooled by shiny logos, why legacy CPQ data models are quietly breaking modern revenue teams, and his bet that rebuilding quoting and approvals from the ground up then layering LLMs on top will eliminate the deal-desk Slack chaos and become the backbone for how enterprise AI companies sell in the next decade. Joubin Mirzadegan swyx Roadrunner Kleiner Perkins

From scaling startups in sales and public cloud (including building Palo Alto Networks’ central U.S. cloud business) to joining Kleiner Perkins to help technical founders turn product edge into repeatable revenue, Joubin Mirzadegan built a career around one obsession: distribution. That obsession led him to start the podcast Grit as a hiring wedge, work alongside breakout companies like Glean and Windsurf, and now incubate Roadrunner which is an AI-native rethink of CPQ as SaaS pricing explodes from “seats” into consumption, bundles, renewals, and SKU sprawl. We sat down with Joubin to dig into the behind-the-scenes craft of great interviews (i.e. why he never sends questions), what Windsurf got right about “Google-class product + Salesforce-class distribution,” how to hire early sales leaders without getting fooled by shiny logos, why legacy CPQ data models are quietly breaking modern revenue teams, and his bet that rebuilding quoting and approvals from the ground up then layering LLMs on top will eliminate the deal-desk Slack chaos and become the backbone for how enterprise AI companies sell in the next decade. Joubin Mirzadegan swyx Roadrunner Kleiner Perkins

77,604 просмотров • 6 месяцев назад

🆕Scaling Test Time Compute to Multi-Agent Civilizations, with Noam Brown We're excited to publish our full conversation with Noam Brown on the frontiers of the new reasoning paradigm at OpenAI! - first principles for starting the "Multi-Agents" team - what's not captured by the "System 1/System 2" analogy for inference time compute - how Ilya Sutskever convinced him that reasoning was closer than he thought - Deep Research is existence proof that RL generalizes beyond verifiable rewards - the relationship between AI for imperfect information games (like Poker, Stratego, Diplomacy) and reasoning Enjoy! on youtube, or wherever fine podcasts are sold.

🆕Scaling Test Time Compute to Multi-Agent Civilizations, with Noam Brown We're excited to publish our full conversation with Noam Brown on the frontiers of the new reasoning paradigm at OpenAI! - first principles for starting the "Multi-Agents" team - what's not captured by the "System 1/System 2" analogy for inference time compute - how Ilya Sutskever convinced him that reasoning was closer than he thought - Deep Research is existence proof that RL generalizes beyond verifiable rewards - the relationship between AI for imperfect information games (like Poker, Stratego, Diplomacy) and reasoning Enjoy! on youtube, or wherever fine podcasts are sold.

105,905 просмотров • 11 месяцев назад

🆕Daytona’s Agent-Native Compute: 60ms sandboxes, 50K startups in 75 sec, 850K daily runs, RL/evals, CLI > MCP, & the end of localhost Daytona CEO Ivan Burazin explains why AI agents need composable computers, how Daytona pivoted from human dev environments to agent sandboxes, why bare metal and stateful snapshots matter, how RL workloads went from 0% to ~50% of usage, why Kubernetes breaks down at agent scale, and why the AI cloud may look more like Stripe than AWS.

🆕Daytona’s Agent-Native Compute: 60ms sandboxes, 50K startups in 75 sec, 850K daily runs, RL/evals, CLI > MCP, & the end of localhost Daytona CEO Ivan Burazin explains why AI agents need composable computers, how Daytona pivoted from human dev environments to agent sandboxes, why bare metal and stateful snapshots matter, how RL workloads went from 0% to ~50% of usage, why Kubernetes breaks down at agent scale, and why the AI cloud may look more like Stripe than AWS.

12,818 просмотров • 1 месяц назад

🔬 Training Transformers to solve 95% failure rate of Cancer Trials the AI for Science pod is back with Ron Alfa, CEO of NOETIK, and Daniel Bear, VP Research at Noetik, explaining exactly how their team of top AI x Bio researchers and engineers (shoutout owl) will use AI to cure cancer, by focusing on key bottlenecks like patient selection, and training large cancer foundation models like TARIO-2, an autoregressive transformer trained on one of the largest sets of tumor spatial transcriptomics datasets in the world... which first required years of blind faith in collecting good data to even get going:

🔬 Training Transformers to solve 95% failure rate of Cancer Trials the AI for Science pod is back with Ron Alfa, CEO of NOETIK, and Daniel Bear, VP Research at Noetik, explaining exactly how their team of top AI x Bio researchers and engineers (shoutout owl) will use AI to cure cancer, by focusing on key bottlenecks like patient selection, and training large cancer foundation models like TARIO-2, an autoregressive transformer trained on one of the largest sets of tumor spatial transcriptomics datasets in the world... which first required years of blind faith in collecting good data to even get going:

18,260 просмотров • 2 месяцев назад

From building internal AI labs to becoming CTO of Brex, James Reggio has helped lead one of the most disciplined AI transformations inside a real financial institution where compliance, auditability, and customer trust actually matter. We sat down with James Reggio to unpack Brex’s three-pillar AI strategy (corporate, operational, and product AI), how SOP-driven agents beat overengineered RL in ops, why Brex lets employees “build their own AI stack” instead of picking winners, and how a small, founder-heavy AI team is shipping production agents to 40,000+ companies. Reggio also goes deep on Brex’s multi-agent “network” architecture, evals for multi-turn systems, agentic coding’s second-order effects on codebase understanding, and why the future of finance software looks less like dashboards and more like executive assistants coordinating specialist agents behind the scenes. James Reggio Brex swyx Alessio Fanelli

From building internal AI labs to becoming CTO of Brex, James Reggio has helped lead one of the most disciplined AI transformations inside a real financial institution where compliance, auditability, and customer trust actually matter. We sat down with James Reggio to unpack Brex’s three-pillar AI strategy (corporate, operational, and product AI), how SOP-driven agents beat overengineered RL in ops, why Brex lets employees “build their own AI stack” instead of picking winners, and how a small, founder-heavy AI team is shipping production agents to 40,000+ companies. Reggio also goes deep on Brex’s multi-agent “network” architecture, evals for multi-turn systems, agentic coding’s second-order effects on codebase understanding, and why the future of finance software looks less like dashboards and more like executive assistants coordinating specialist agents behind the scenes. James Reggio Brex swyx Alessio Fanelli

32,800 просмотров • 5 месяцев назад

From a scrappy side project built to solve their own LLM optimization problems to becoming the industry’s de-facto independent scoreboard, Micah Hill-Smith and George Cameron went through the arc of launching Artificial Analysis for free, paying benchmarking costs out of pocket, and growing it into what many now call the “new Gartner of AI” for enterprises, labs, and developers. We sat down with Micah and George to unpack why truly independent benchmarking is so hard (prompt variance, eval saturation, mystery-shopper policies), how the Artificial Analysis Intelligence Index evolved as old benchmarks broke, and what new metrics actually matter now such as agentic evals (GDPVal-AA). We also dig into the economics behind the “smile curve” of AI: why intelligence is getting 100–1000× cheaper per unit while total spend explodes, how reasoning and agents change token efficiency, and their bet that evals must continuously evolve or risk training the industry to optimize for the wrong things. swyx Micah Hill-Smith George Cameron

From a scrappy side project built to solve their own LLM optimization problems to becoming the industry’s de-facto independent scoreboard, Micah Hill-Smith and George Cameron went through the arc of launching Artificial Analysis for free, paying benchmarking costs out of pocket, and growing it into what many now call the “new Gartner of AI” for enterprises, labs, and developers. We sat down with Micah and George to unpack why truly independent benchmarking is so hard (prompt variance, eval saturation, mystery-shopper policies), how the Artificial Analysis Intelligence Index evolved as old benchmarks broke, and what new metrics actually matter now such as agentic evals (GDPVal-AA). We also dig into the economics behind the “smile curve” of AI: why intelligence is getting 100–1000× cheaper per unit while total spend explodes, how reasoning and agents change token efficiency, and their bet that evals must continuously evolve or risk training the industry to optimize for the wrong things. swyx Micah Hill-Smith George Cameron

33,143 просмотров • 5 месяцев назад

Touring Impulse with the Stove Guy Sam D'Amico and allen featuring never-seen-before AI cooking features. Sam is reinventing the stove from first principles: battery-powered architecture, software-defined hardware, & OTA updates. In this episode, we discuss: • Why Impulse centered the stove around a battery • How they achieved extreme power + temperature control • Why Sam sees appliances as software-defined hardware • How Claude and AI tools are already part of the product + how Sam uses AI Timestamps: 0:00 Intro 0:18 The Impulse Cooktop Demo 0:55 Zephyr Partnership + Showrooms 1:14 Magnetic Knob Interface 2:04 How Impulse Tests and QA’s Every Build 3:34 Rebuilding the Stove From First Principles 5:02 Impulse Core and OEM Partnerships 6:09 How Long It Took to Build the Product 8:07 The Japan Pizza Story That Sparked Impulse 9:22 Why Sam Left Consumer Hardware for Appliances 12:42 Why the Stove Was the Right Entry Point 14:31 Building Appliances “Like a Phone” 17:17 Will Impulse Ship a Single-Burner Version? 17:37 Cooking Begins on the AI Dev Unit 19:10 How Sam Uses AI to Build Software at Impulse 20:36 Scallops Demo: Real-Time Temperature Control 22:20 Boiling Pasta at 10,000 Watts 24:49 AI Recipes, Figma, and Multi-Burner Cooking 28:25 Butter, Pasta, and Controlled Temperature Cooking 30:21 Zephyr Launch and the Bigger Smart Home Vision 32:45 Global Expansion Plans 33:34 AI Features, OTA Updates, and What Ships This Year 35:14 Taste Test with the Judges 36:42 Where to Buy Impulse

Touring Impulse with the Stove Guy Sam D'Amico and allen featuring never-seen-before AI cooking features. Sam is reinventing the stove from first principles: battery-powered architecture, software-defined hardware, & OTA updates. In this episode, we discuss: • Why Impulse centered the stove around a battery • How they achieved extreme power + temperature control • Why Sam sees appliances as software-defined hardware • How Claude and AI tools are already part of the product + how Sam uses AI Timestamps: 0:00 Intro 0:18 The Impulse Cooktop Demo 0:55 Zephyr Partnership + Showrooms 1:14 Magnetic Knob Interface 2:04 How Impulse Tests and QA’s Every Build 3:34 Rebuilding the Stove From First Principles 5:02 Impulse Core and OEM Partnerships 6:09 How Long It Took to Build the Product 8:07 The Japan Pizza Story That Sparked Impulse 9:22 Why Sam Left Consumer Hardware for Appliances 12:42 Why the Stove Was the Right Entry Point 14:31 Building Appliances “Like a Phone” 17:17 Will Impulse Ship a Single-Burner Version? 17:37 Cooking Begins on the AI Dev Unit 19:10 How Sam Uses AI to Build Software at Impulse 20:36 Scallops Demo: Real-Time Temperature Control 22:20 Boiling Pasta at 10,000 Watts 24:49 AI Recipes, Figma, and Multi-Burner Cooking 28:25 Butter, Pasta, and Controlled Temperature Cooking 30:21 Zephyr Launch and the Bigger Smart Home Vision 32:45 Global Expansion Plans 33:34 AI Features, OTA Updates, and What Ships This Year 35:14 Taste Test with the Judges 36:42 Where to Buy Impulse

16,650 просмотров • 2 месяцев назад

Dylan breaks down the various Taiwan/TSMC endgames and what is likely to happen while cooking up some fried rice. Dylan Patel

Dylan breaks down the various Taiwan/TSMC endgames and what is likely to happen while cooking up some fried rice. Dylan Patel

21,167 просмотров • 3 месяцев назад