Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Tackling complex problems with LMs requires search/planning, but how should test-time compute be structured? Introducing Self-Steering, a new meta-reasoning framework where LMs coordinate their own inference procedures by writing code!

Gabe Grand

1,599 subscribers

20,483 views • 1 year ago •via X (Twitter)

Science & Technology Education

Anya Rossi• Live Now

Private livecam show

21 Comments

Ġabe Ġrand @ ICLR 20251 year ago

Today’s AI models excel at math, science, and programming, but simultaneously struggle with much more basic problems. People like @karpathy & @kevinroose have used the term “jagged intelligence” to describe this discrepancy.

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose Much of this “jaggedness” arises especially for problems that require long-horizon search/planning.

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose For instance, consider this constrained generation prompt, which is manageable for most English speakers, but hard for even very capable LMs like GPT-4o.

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose One approach to these kinds of problems is to sample repeatedly from the LM until we get a valid generation.

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose As noted in recent work (e.g., Brown et al., 2024), this is a really simple way to scale performance

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose However, repeated sampling has some key drawbacks: ❌ Requires a verifier ❌ Cost scales with problem complexity ❌ Assumes the LM will eventually produce a valid sample (not always the case for complex tasks)

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose Another approach, recently popularized by models like o1 (OpenAI) and R1 (DeepSeek), is to generate extended chain-of-thought reasoning.

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose While these latest models can crack this toy problem, CoT at inference-time can also be slow, costly, and can still produce errors. Moreover, reasoning autoregressively means linearizing separate branches into one long “stream of search,” which forfeits parallelism.

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose Stepping back, one key observation is that even when LMs struggle to emulate the precise reasoning steps needed to solve a problem, they often excel at describing its abstract structure -- both *how to verify* solutions and *how to search* for them!

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose Motivated by this insight, we introduce DiSCIPL, a meta-reasoning approach that equips LMs with the ability to write recursive search procedures that guide LM inference, enabling new forms of verifiable and efficient reasoning.

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose In DisCIPL, a Planner LM writes an inference program that defines step-by-step computations to steer a population of Follower LMs.

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose Our approach combines the benefits of serial and parallel methods: the Planner ensures correctness by construction, while the Followers collectively search for sequences with high probability.

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose To test this approach, we evaluate DisCIPL on a variety of challenging constrained generation tasks. We find that DisCIPL enables a small Follower (Llama-3.2-1B) to match -- and sometimes outperform -- much larger models like GPT-4o and o1!

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose To understand why this approach is effective, it’s useful to think about DisCIPL as a programming toolkit for LMs that gives the Planner fine-grained control over the Follower.

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose One particularly powerful pattern allows the Planner to dynamically inject information into the Follower's system prompt *mid-generation*. We call this “self-hinting.” Think of it like a generalized decoding-time calculator that can perform arbitrary Python computations.

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose With these tools, we find that DisCIPL is able to solve a variety of hard search tasks like poetry composition, grant-writing, budgeting, and itinerary planning -- all using a 1B Llama model as Follower!

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose Self-steering takeaways: ✅ LMs can write code to steer other LMs, even when they can't solve tasks themselves! ✅ Enables small LMs (e.g., Llama-1B) to perform like larger ones (e.g., GPT-4o and o1) ✅ Requires no finetuning and can be implemented automatically by existing LMs!

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose For more details, please see our paper:

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose Thanks again to my incredible collaborators Josh Tenenbaum, @vmansinghka, @alexanderklew, @jacobandreas for providing expert meta-steering on this work!

Ġabe Ġrand @ ICLR 20251 year ago

@kevinroose @vmansinghka @alexanderklew @jacobandreas For those @iclr_conf in Singapore, I'll be giving a talk on this work at the VerifAI @ ICLR workshop on April 27! Look forward to seeing you there!

RTTS1 year ago

API testing of interfaces is critical to determine if they meet requirements for functionality, reliability, performance, and security. Check out RTTS - the automated testing experts since 1996. #API #testautomation #integrationtest

Related Videos

🚀 How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produce a correct answer? Best-of-N (e.g., GRPO) and tree search share two limitations: 🔻 Verification signals are sparse 🔻 Candidates stay within the model's own distribution We introduce BES: Bidirectional Evolutionary Search — a search framework that couples forward candidate evolution with backward goal decomposition. ✅ Works for both post-training and inference.

🚀 How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produce a correct answer? Best-of-N (e.g., GRPO) and tree search share two limitations: 🔻 Verification signals are sparse 🔻 Candidates stay within the model's own distribution We introduce BES: Bidirectional Evolutionary Search — a search framework that couples forward candidate evolution with backward goal decomposition. ✅ Works for both post-training and inference.

Guowei Xu

244,684 views • 1 month ago

LLM agents have demonstrated promise in their ability to automate computer tasks, but face challenges with multi-step reasoning and planning. Towards addressing this, we propose an inference-time tree search algorithm for LLM agents to explicitly perform exploration and multi-step planning in interactive web environments. It is the first tree search algorithm for LLM agents that shows effectiveness on realistic and complex web environments: on the challenging VisualWebArena benchmark, applying our search algorithm on top of a GPT-4o agent yields a 39.7% relative increase in success rate compared to the same baseline without search, setting a state-of-the-art success rate of 26.4%. On WebArena, search also yields a 28.0% relative improvement over a baseline agent, setting a competitive success rate of 19.2%.

LLM agents have demonstrated promise in their ability to automate computer tasks, but face challenges with multi-step reasoning and planning. Towards addressing this, we propose an inference-time tree search algorithm for LLM agents to explicitly perform exploration and multi-step planning in interactive web environments. It is the first tree search algorithm for LLM agents that shows effectiveness on realistic and complex web environments: on the challenging VisualWebArena benchmark, applying our search algorithm on top of a GPT-4o agent yields a 39.7% relative increase in success rate compared to the same baseline without search, setting a state-of-the-art success rate of 26.4%. On WebArena, search also yields a 28.0% relative improvement over a baseline agent, setting a competitive success rate of 19.2%.

Jing Yu Koh

124,878 views • 2 years ago

Demis Hassabis, CEO of Google DeepMind, drops a quiet bombshell: The big question isn’t whether AI can solve problems. It’s whether AI can invent new science. Right now, it can’t. Not because of compute. Not because of data. But because it lacks something fundamental: A world model. Today’s LLMs can generate brilliant text, images, even code. But they don’t truly understand causality. They don’t know why A leads to B. They just predict patterns. Hassabis argues that real scientific discovery requires more: – Long-term planning – Stronger reasoning – And an internal model of how the world works Physics. Biology. Cause and effect. Only then can an AI run its own thought experiments. Only then do we get a true digital scientist.

Demis Hassabis, CEO of Google DeepMind, drops a quiet bombshell: The big question isn’t whether AI can solve problems. It’s whether AI can invent new science. Right now, it can’t. Not because of compute. Not because of data. But because it lacks something fundamental: A world model. Today’s LLMs can generate brilliant text, images, even code. But they don’t truly understand causality. They don’t know why A leads to B. They just predict patterns. Hassabis argues that real scientific discovery requires more: – Long-term planning – Stronger reasoning – And an internal model of how the world works Physics. Biology. Cause and effect. Only then can an AI run its own thought experiments. Only then do we get a true digital scientist.

VraserX e/acc

167,199 views • 6 months ago

OpenAI just announced API access to o1 (advanced reasoning model) yesterday. I'm delighted to announce today a new short course, Reasoning with o1, built with OpenAI, and taught by Colin Jarvis, Head of AI Solutions at OpenAI, to show you how to use this effectively! Unlike previous language models which generate output directly, o1 “thinks before it responds,” and generates many reasoning tokens before returning a more thoughtful and accurate response. It is great at complex reasoning -- including planning for agentic workflows, coding, and domain-specific reasoning in STEM fields like law. But how you should use it is quite different from other LLMs. I think o1 will be a game changer for many AI applications; and in this course, you'll learn how to use it effectively. In detail, you’ll: - Learn to recognize what tasks o1 is suited for, and when to use a smaller model, or combine o1 with a smaller model - Understand the new principles of prompting reasoning models: Be simple and direct; no explicit chain-of-thought required; use structure; show rather than tell - Implement multi-step orchestration in which o1 plans, and hands tasks over to gpt-4o-mini to execute specific steps; this illustrates a design pattern to optimize intelligence (accuracy) and cost - Use o1 for a coding task to build a new application, edit existing code, and test performance by running a coding competition between o1-mini and GPT 4o - Use o1 for image understanding and learn how it performs better with a "hierarchy of reasoning," in which it incurs the latency and cost upfront, preprocessing the image and indexing it with rich details so it can be used for Q&A later - Learn a technique called meta-prompting, in which you use o1 to improve your prompts. Using a customer support evaluation set, you'll iteratively use o1 to modify a prompt to improve performance You'll also learn about how OpenAI used reinforcement learning to produce a model that uses "test-time compute" to improve performance. I think you'll find this course enjoyable and valuable. Please sign up for it here:

OpenAI just announced API access to o1 (advanced reasoning model) yesterday. I'm delighted to announce today a new short course, Reasoning with o1, built with OpenAI, and taught by Colin Jarvis, Head of AI Solutions at OpenAI, to show you how to use this effectively! Unlike previous language models which generate output directly, o1 “thinks before it responds,” and generates many reasoning tokens before returning a more thoughtful and accurate response. It is great at complex reasoning -- including planning for agentic workflows, coding, and domain-specific reasoning in STEM fields like law. But how you should use it is quite different from other LLMs. I think o1 will be a game changer for many AI applications; and in this course, you'll learn how to use it effectively. In detail, you’ll: - Learn to recognize what tasks o1 is suited for, and when to use a smaller model, or combine o1 with a smaller model - Understand the new principles of prompting reasoning models: Be simple and direct; no explicit chain-of-thought required; use structure; show rather than tell - Implement multi-step orchestration in which o1 plans, and hands tasks over to gpt-4o-mini to execute specific steps; this illustrates a design pattern to optimize intelligence (accuracy) and cost - Use o1 for a coding task to build a new application, edit existing code, and test performance by running a coding competition between o1-mini and GPT 4o - Use o1 for image understanding and learn how it performs better with a "hierarchy of reasoning," in which it incurs the latency and cost upfront, preprocessing the image and indexing it with rich details so it can be used for Q&A later - Learn a technique called meta-prompting, in which you use o1 to improve your prompts. Using a customer support evaluation set, you'll iteratively use o1 to modify a prompt to improve performance You'll also learn about how OpenAI used reinforcement learning to produce a model that uses "test-time compute" to improve performance. I think you'll find this course enjoyable and valuable. Please sign up for it here:

Andrew Ng

357,661 views • 1 year ago

🚀 New Paper: Pixel Reasoner 🧠🖼️ How can Vision-Language Models (VLMs) perform chain-of-thought reasoning within the image itself? We introduce Pixel Reasoner, the first open-source framework that enables VLMs to “think in pixel space” through curiosity-driven reinforcement learning. Current VLMs reason only in text — even when grounded in rich images or videos, their logical steps are verbalized in natural language. This restricts their ability to interrogate visual evidence and demonstrate how conclusions are drawn. 🔍 So we ask: What if we could make VLMs "show their work" by reasoning directly in the pixel space? Inspired by GPT-o3’s "think-in-image" ability, we propose a framework where VLMs use interactive visual operations — zoom, select-frame, highlight — to reason through complex visual inputs. To do this, we design a two-stage training process: Instruction tuning with synthesized visual reasoning traces. Reinforcement learning with curiosity-driven reward to balance exploration between pixel and text reasoning ✨ With this, Pixel Reasoner achieves near-SoTA performance on many information-rich multimodal benchmarks: 📊 84% on InfographicsVQA 🧠 84% on V* benchmark 🧩 74% on TallyQA-Complex It also achieves strong accuracy of 68% on MVBench (a video benchmark). Website: Paper: Code: Demo: (coming soon)

🚀 New Paper: Pixel Reasoner 🧠🖼️ How can Vision-Language Models (VLMs) perform chain-of-thought reasoning within the image itself? We introduce Pixel Reasoner, the first open-source framework that enables VLMs to “think in pixel space” through curiosity-driven reinforcement learning. Current VLMs reason only in text — even when grounded in rich images or videos, their logical steps are verbalized in natural language. This restricts their ability to interrogate visual evidence and demonstrate how conclusions are drawn. 🔍 So we ask: What if we could make VLMs "show their work" by reasoning directly in the pixel space? Inspired by GPT-o3’s "think-in-image" ability, we propose a framework where VLMs use interactive visual operations — zoom, select-frame, highlight — to reason through complex visual inputs. To do this, we design a two-stage training process: Instruction tuning with synthesized visual reasoning traces. Reinforcement learning with curiosity-driven reward to balance exploration between pixel and text reasoning ✨ With this, Pixel Reasoner achieves near-SoTA performance on many information-rich multimodal benchmarks: 📊 84% on InfographicsVQA 🧠 84% on V* benchmark 🧩 74% on TallyQA-Complex It also achieves strong accuracy of 68% on MVBench (a video benchmark). Website: Paper: Code: Demo: (coming soon)

Wenhu Chen

82,829 views • 1 year ago

Introducing Reinforcement-Learned Teachers (RLTs): Transforming how we teach LLMs to reason with reinforcement learning (RL). Blog: Paper: Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and constitutes a key step in making student AI systems ultimately acquire reasoning capabilities via distillation and cold-starting. Enter our RLTs—a new class of models prompted with not only a problem’s question but also its solution, and directly trained to generate clear, step-by-step “explanations” to teach their students. Remarkably, an RLT with only 7B parameters produces superior results when distilling and cold-starting students in competitive and graduate-level reasoning tasks than orders-of-magnitude larger LLMs. RLTs are as effective even when distilling 32B students, much larger than the teacher itself—unlocking a new standard for efficiency in developing reasoning language models with RL. Code:

Introducing Reinforcement-Learned Teachers (RLTs): Transforming how we teach LLMs to reason with reinforcement learning (RL). Blog: Paper: Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and constitutes a key step in making student AI systems ultimately acquire reasoning capabilities via distillation and cold-starting. Enter our RLTs—a new class of models prompted with not only a problem’s question but also its solution, and directly trained to generate clear, step-by-step “explanations” to teach their students. Remarkably, an RLT with only 7B parameters produces superior results when distilling and cold-starting students in competitive and graduate-level reasoning tasks than orders-of-magnitude larger LLMs. RLTs are as effective even when distilling 32B students, much larger than the teacher itself—unlocking a new standard for efficiency in developing reasoning language models with RL. Code:

Sakana AI

179,276 views • 1 year ago

We built the first AI agent that has its own computer powered by Hyperbolic! AI agents are now GPU rich! We developed an AgentKit that allow AI agents to • Check GPU availability • Rent & manage GPU compute • Access & run commands on remote machines Why does this matter? With their own compute resources, AI agents can: 1. Validate blockchains like Ethereum and decentralized protocols like 2. Launch and coordinate AI swarms on Hyperbolic's decentralized compute network 3. Train and fine-tune models, improving their own capabilities over time 4. Dive into AI research to push the boundaries of AI, i.e. themselves 5. Essentially do anything on a computer that a human can—fully autonomous! Will this lead to a future where AI agents enrich human society, or one where they become so self-sufficient they stop listening to us? Only time will tell. —————— Big shoutout to Coinbase Developer Platform🛡️'s CDP agentkit for inspiration. This repo is done by two non-engineers (our pm Kai Huang and myself) + Cursor ai agent to run agents. Codings can now be easily done by just prompting ai agents. What a crazy time!

We built the first AI agent that has its own computer powered by Hyperbolic! AI agents are now GPU rich! We developed an AgentKit that allow AI agents to • Check GPU availability • Rent & manage GPU compute • Access & run commands on remote machines Why does this matter? With their own compute resources, AI agents can: 1. Validate blockchains like Ethereum and decentralized protocols like 2. Launch and coordinate AI swarms on Hyperbolic's decentralized compute network 3. Train and fine-tune models, improving their own capabilities over time 4. Dive into AI research to push the boundaries of AI, i.e. themselves 5. Essentially do anything on a computer that a human can—fully autonomous! Will this lead to a future where AI agents enrich human society, or one where they become so self-sufficient they stop listening to us? Only time will tell. —————— Big shoutout to Coinbase Developer Platform🛡️'s CDP agentkit for inspiration. This repo is done by two non-engineers (our pm Kai Huang and myself) + Cursor ai agent to run agents. Codings can now be easily done by just prompting ai agents. What a crazy time!

Jasper

161,741 views • 1 year ago

Introducing the NASA Earthdata Plugin for QGIS! NASA’s Earthdata archive hosts over 120 petabytes of satellite imagery and geospatial datasets, but accessing it has often required navigating complex web portals or writing code. This new QGIS plugin changes that. With the NASA Earthdata plugin, you can search, filter, preview, and download NASA datasets directly inside QGIS, no programming needed. It’s designed for anyone who wants fast, seamless access to NASA data within their existing GIS workflow. Plugin page: GitHub: Step-by-step video tutorial: #QGIS #NASA #satelliteimagery #opensource

Introducing the NASA Earthdata Plugin for QGIS! NASA’s Earthdata archive hosts over 120 petabytes of satellite imagery and geospatial datasets, but accessing it has often required navigating complex web portals or writing code. This new QGIS plugin changes that. With the NASA Earthdata plugin, you can search, filter, preview, and download NASA datasets directly inside QGIS, no programming needed. It’s designed for anyone who wants fast, seamless access to NASA data within their existing GIS workflow. Plugin page: GitHub: Step-by-step video tutorial: #QGIS #NASA #satelliteimagery #opensource

Qiusheng Wu

43,321 views • 6 months ago

Rivians new autonomous software tries to run a red light. In GJEEBS latest video he goes out with Rivian to get a look at their latest self-driving offerings. This video shows you that no matter how much compute you throw at a problem you will always be constrained by the brain (AI). I’m sure this will get better in time, but for those touting this as an FSD killer - I think you may need a reality check.

Rivians new autonomous software tries to run a red light. In GJEEBS latest video he goes out with Rivian to get a look at their latest self-driving offerings. This video shows you that no matter how much compute you throw at a problem you will always be constrained by the brain (AI). I’m sure this will get better in time, but for those touting this as an FSD killer - I think you may need a reality check.

Devin Olsen

81,373 views • 7 months ago

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

Andrew Ng

127,724 views • 1 year ago

Introducing Ask YouTube, our new conversational search experience in YouTube. 📽️ With Ask YouTube, you can ask more complex search queries, like needing help planning a road trip through the California coast or wanting tips on how to teach your kid to ride a bike. You can even ask follow-up questions to continue refining what you’re looking for. Ask YouTube will compile the most relevant videos across all of YouTube’s catalogue — including long-form videos and Shorts — and provide an interactive, structured response instead of the usual list of video recommendations. Ask YouTube is currently available for Premium members in the U.S., and it’ll be rolling out to all YouTube users soon.

Introducing Ask YouTube, our new conversational search experience in YouTube. 📽️ With Ask YouTube, you can ask more complex search queries, like needing help planning a road trip through the California coast or wanting tips on how to teach your kid to ride a bike. You can even ask follow-up questions to continue refining what you’re looking for. Ask YouTube will compile the most relevant videos across all of YouTube’s catalogue — including long-form videos and Shorts — and provide an interactive, structured response instead of the usual list of video recommendations. Ask YouTube is currently available for Premium members in the U.S., and it’ll be rolling out to all YouTube users soon.

Google

210,872 views • 2 months ago

Andrew Ng: "if I had to pick one technology today - it's graphs of self-improving agents I have 4 patterns that necessary" here are 4 rules: rule 1 → reflection make the agent critique its own output, find problems, rewrite - one loop of self-review beats a smarter model with no review rule 2 → tool use - don't just think, act - give the agent search, code execution, APIs - thinking without tools is guessing rule 3 → planning - break complex tasks into steps before executing - agents that plan first solve what agents that rush can't rule 4 → multi-agent collaboration - don't run one agent run a team: one writes code, another critiques it, another tests many people overlook these crucial rules behind every smart agent save this - then read how to wire these 4 rules into one graph ↓

Andrew Ng: "if I had to pick one technology today - it's graphs of self-improving agents I have 4 patterns that necessary" here are 4 rules: rule 1 → reflection make the agent critique its own output, find problems, rewrite - one loop of self-review beats a smarter model with no review rule 2 → tool use - don't just think, act - give the agent search, code execution, APIs - thinking without tools is guessing rule 3 → planning - break complex tasks into steps before executing - agents that plan first solve what agents that rush can't rule 4 → multi-agent collaboration - don't run one agent run a team: one writes code, another critiques it, another tests many people overlook these crucial rules behind every smart agent save this - then read how to wire these 4 rules into one graph ↓

codila

94,492 views • 4 days ago

ELON: AI CLUSTERS WILL BECOME A NORMAL THING EVERY COUNTRY HAS Compute is becoming a new currency. While building a "frontier model" today is a high-stakes gauntlet that requires a level of technical skill only a few companies possess, he predicts that AI compute clusters are about to become as essential to a nation as a power grid or a military. By 2026, "AI Sovereignty" will be the only thing keeping nations from becoming digital colonies of Silicon Valley or Beijing: “Probably all countries will have their own AI clusters over time. It is currently very difficult to actually build an AI cluster and have it run... because if you are training a frontier model, then you need a massive amount of compute and a level of technical skill that only a few companies possess. But over time, I think every country will have AI compute clusters. It is just going to be a normal thing that every country has.” Source: vitrupo Elon Musk

ELON: AI CLUSTERS WILL BECOME A NORMAL THING EVERY COUNTRY HAS Compute is becoming a new currency. While building a "frontier model" today is a high-stakes gauntlet that requires a level of technical skill only a few companies possess, he predicts that AI compute clusters are about to become as essential to a nation as a power grid or a military. By 2026, "AI Sovereignty" will be the only thing keeping nations from becoming digital colonies of Silicon Valley or Beijing: “Probably all countries will have their own AI clusters over time. It is currently very difficult to actually build an AI cluster and have it run... because if you are training a frontier model, then you need a massive amount of compute and a level of technical skill that only a few companies possess. But over time, I think every country will have AI compute clusters. It is just going to be a normal thing that every country has.” Source: vitrupo Elon Musk

Mario Nawfal

20,502 views • 6 months ago

How will AI change writing? For clues, look at the current state of software development. Marc Andreessen says: "The new coding interface is you've got your code and a chatbot. The Copilot is reading your code and giving you real-time comments. Or the chatbot is actually writing the code for you, and then you're giving it feedback. We had this old concept in the old days called pair programming, where you'd put two coders together in front of the same keyboard, and they could talk to each other and write code together. But now you're just going to have pair programming happening much more broadly, which is human plus machine. And by the way, this hasn't happened yet, but I think the same thing will happen for every other form of writing. So I think prose writing and fiction writing and all kinds of professional authors will be working in this format where you'll have this continuous dialogue going with an AI while you're in charge of the overall product." — Marc Andreessen 🇺🇸 (Recorded in 2024)

How will AI change writing? For clues, look at the current state of software development. Marc Andreessen says: "The new coding interface is you've got your code and a chatbot. The Copilot is reading your code and giving you real-time comments. Or the chatbot is actually writing the code for you, and then you're giving it feedback. We had this old concept in the old days called pair programming, where you'd put two coders together in front of the same keyboard, and they could talk to each other and write code together. But now you're just going to have pair programming happening much more broadly, which is human plus machine. And by the way, this hasn't happened yet, but I think the same thing will happen for every other form of writing. So I think prose writing and fiction writing and all kinds of professional authors will be working in this format where you'll have this continuous dialogue going with an AI while you're in charge of the overall product." — Marc Andreessen 🇺🇸 (Recorded in 2024)

David Perell Clips

13,101 views • 3 months ago

My last hours with Fable: I was building this movement parkour sim before it went down... Impressed by its autonomy: built its own self-verifying harness with its own rubric for how a movement should feel. When a new movement was added it could tell on its own what felt right or wrong until it felt 100% right without me in the loop (and it was very good at it) Fable was more than just another model iteration imo. For the short (but intense) time it was available, it felt like playing with clay: ideas became code with almost no friction and the line between both became blurry. More than ever: open source MUST win. I don't want a world where intelligence is centralized and you're stuck with a hand saw while others have a chainsaw.

My last hours with Fable: I was building this movement parkour sim before it went down... Impressed by its autonomy: built its own self-verifying harness with its own rubric for how a movement should feel. When a new movement was added it could tell on its own what felt right or wrong until it felt 100% right without me in the loop (and it was very good at it) Fable was more than just another model iteration imo. For the short (but intense) time it was available, it felt like playing with clay: ideas became code with almost no friction and the line between both became blurry. More than ever: open source MUST win. I don't want a world where intelligence is centralized and you're stuck with a hand saw while others have a chainsaw.

Victor M

82,913 views • 1 month ago

New Paper! Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents A longstanding goal of AI research has been the creation of AI that can learn indefinitely. One path toward that goal is an AI that improves itself by rewriting its own code, including any code responsible for learning. That idea, known as a Gödel Machine, proposed by Jürgen Schmidhuber over two decades ago, is a hypothetical self-improving AI. It optimally solves problems by recursively rewriting its own code when it can mathematically prove a better strategy, making it a key concept in meta-learning or “learning to learn.” While the theoretical Gödel Machine promised provably beneficial self-modifications, its realization relied on an impractical assumption: that the AI could mathematically prove that a proposed change in its own code would yield a net improvement before adopting it. Sakana AI, in collaboration with Jeff Clune’s lab at UBC, proposes something more feasible: a system that harnesses the principles of open-ended algorithms like Darwinian evolution to search for improvements that empirically improve performance. We call the result the Darwin Gödel Machine. DGMs leverage foundation models to propose code improvements, and use recent innovations in open-ended algorithms to search for a growing library of diverse, high-quality AI agents. Applied to practical tasks, we implemented Darwin Gödel Machine as a self-improving coding agent that rewrites its own code to improve performance on programming tasks. It creates various self-improvements, such as a patch validation step, better file viewing, enhanced editing tools, generating and ranking multiple solutions to choose the best one, and adding a history of what has been tried before (and why it failed) when making new changes (see the attached video). We believe that Darwin Gödel Machines represent a concrete step towards AI systems that can autonomously gather their own stepping stones to learn and innovate forever!

New Paper! Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents A longstanding goal of AI research has been the creation of AI that can learn indefinitely. One path toward that goal is an AI that improves itself by rewriting its own code, including any code responsible for learning. That idea, known as a Gödel Machine, proposed by Jürgen Schmidhuber over two decades ago, is a hypothetical self-improving AI. It optimally solves problems by recursively rewriting its own code when it can mathematically prove a better strategy, making it a key concept in meta-learning or “learning to learn.” While the theoretical Gödel Machine promised provably beneficial self-modifications, its realization relied on an impractical assumption: that the AI could mathematically prove that a proposed change in its own code would yield a net improvement before adopting it. Sakana AI, in collaboration with Jeff Clune’s lab at UBC, proposes something more feasible: a system that harnesses the principles of open-ended algorithms like Darwinian evolution to search for improvements that empirically improve performance. We call the result the Darwin Gödel Machine. DGMs leverage foundation models to propose code improvements, and use recent innovations in open-ended algorithms to search for a growing library of diverse, high-quality AI agents. Applied to practical tasks, we implemented Darwin Gödel Machine as a self-improving coding agent that rewrites its own code to improve performance on programming tasks. It creates various self-improvements, such as a patch validation step, better file viewing, enhanced editing tools, generating and ranking multiple solutions to choose the best one, and adding a history of what has been tried before (and why it failed) when making new changes (see the attached video). We believe that Darwin Gödel Machines represent a concrete step towards AI systems that can autonomously gather their own stepping stones to learn and innovate forever!

hardmaru

104,854 views • 1 year ago

Shipping speed is up across the board, and AI tools like Gemini 3 Pro are helping us code and build faster than ever. But validation and quality assurance? Still the bottleneck. That’s why I’m integrating TestSprite into my workflow. It analyzes my code and PRD, automatically builds a comprehensive test layer, and keeps my development flow unbroken. No test writing. No last-minute QA chaos. What it covers: ➞ UI Flow Coverage: Verifies real-time updates and synchronization (e.g., confirming gradient sliders and angle controls work together). ➞ Edge Case Validation: Catches subtle logic bugs like inputs accepting out-of-range values that I'd typically miss. ➞ Functionality Check: Automatically runs videos to ensure core features (like "Randomize" and "Copy to Clipboard") are working. ➞ Repeatable Test Suite: Creates tests I can rerun anytime, giving me confidence that new code hasn't broken the old. My new Antigravity + Test Sprite Loop: ➞ AI Plan & Code: Use Antigravity's Planning Mode with Gemini 3 Pro to generate structured code. ➞ Browser Agent QA: Use Antigravity's Open Browser feature to let the AI visually analyze and control the live UI for quick verification. ➞ Validate & Test: Run the automated, reproducible test suite in Test Sprite. ➞ Ship. This loop feels clean, fast, and robust. Works beautifully alongside Gemini 3 Pro, Cloud Sonet, or any AI stack.

Shipping speed is up across the board, and AI tools like Gemini 3 Pro are helping us code and build faster than ever. But validation and quality assurance? Still the bottleneck. That’s why I’m integrating TestSprite into my workflow. It analyzes my code and PRD, automatically builds a comprehensive test layer, and keeps my development flow unbroken. No test writing. No last-minute QA chaos. What it covers: ➞ UI Flow Coverage: Verifies real-time updates and synchronization (e.g., confirming gradient sliders and angle controls work together). ➞ Edge Case Validation: Catches subtle logic bugs like inputs accepting out-of-range values that I'd typically miss. ➞ Functionality Check: Automatically runs videos to ensure core features (like "Randomize" and "Copy to Clipboard") are working. ➞ Repeatable Test Suite: Creates tests I can rerun anytime, giving me confidence that new code hasn't broken the old. My new Antigravity + Test Sprite Loop: ➞ AI Plan & Code: Use Antigravity's Planning Mode with Gemini 3 Pro to generate structured code. ➞ Browser Agent QA: Use Antigravity's Open Browser feature to let the AI visually analyze and control the live UI for quick verification. ➞ Validate & Test: Run the automated, reproducible test suite in Test Sprite. ➞ Ship. This loop feels clean, fast, and robust. Works beautifully alongside Gemini 3 Pro, Cloud Sonet, or any AI stack.

Dhaval Makwana

110,884 views • 7 months ago

We partnered with artists, designers, and builders to create new AI tools that solve real problems in their creative workflows. Here’s what’s new: — Introducing Google Pics in Google Workspace: A brand-new image creation & editing tool. Move and resize objects, add text, and translate just by hovering and clicking — Big updates to صافي النيه😉: 1) You can now create with Gemini Omni Flash in Google Flow 2) Google Flow Agent is a multi-step creative partner that reasons and plans complex tasks with you. 3) Google Flow tools are custom tools you can “vibe code” for animations, video effects, text layering & more — Design live with Stitch by Google: Now, you can use text or voice prompts to edit layouts in real time then export those designs straight to code — More creative control in صافي النيه😉Music: Edit songs section by section, remix the style of full songs, and create music videos with our new Gemini Omni Flash model

Google AI

13,953,239 views • 2 months ago