Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Tackling complex problems with LMs requires search/planning, but how should test-time compute be structured? Introducing Self-Steering, a new meta-reasoning framework where LMs coordinate their own inference procedures by writing code!

Gabe Grand

1,438 subscribers

20,315 görüntüleme • 1 yıl önce •via X (Twitter)

Bilim & Teknoloji Eğitim

Anya Rossi• Live Now

Private livecam show

21 Yorum

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

Today’s AI models excel at math, science, and programming, but simultaneously struggle with much more basic problems. People like @karpathy & @kevinroose have used the term “jagged intelligence” to describe this discrepancy.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Much of this “jaggedness” arises especially for problems that require long-horizon search/planning.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose For instance, consider this constrained generation prompt, which is manageable for most English speakers, but hard for even very capable LMs like GPT-4o.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose One approach to these kinds of problems is to sample repeatedly from the LM until we get a valid generation.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose As noted in recent work (e.g., Brown et al., 2024), this is a really simple way to scale performance

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose However, repeated sampling has some key drawbacks: ❌ Requires a verifier ❌ Cost scales with problem complexity ❌ Assumes the LM will eventually produce a valid sample (not always the case for complex tasks)

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Another approach, recently popularized by models like o1 (OpenAI) and R1 (DeepSeek), is to generate extended chain-of-thought reasoning.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose While these latest models can crack this toy problem, CoT at inference-time can also be slow, costly, and can still produce errors. Moreover, reasoning autoregressively means linearizing separate branches into one long “stream of search,” which forfeits parallelism.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Stepping back, one key observation is that even when LMs struggle to emulate the precise reasoning steps needed to solve a problem, they often excel at describing its abstract structure -- both *how to verify* solutions and *how to search* for them!

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Motivated by this insight, we introduce DiSCIPL, a meta-reasoning approach that equips LMs with the ability to write recursive search procedures that guide LM inference, enabling new forms of verifiable and efficient reasoning.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose In DisCIPL, a Planner LM writes an inference program that defines step-by-step computations to steer a population of Follower LMs.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Our approach combines the benefits of serial and parallel methods: the Planner ensures correctness by construction, while the Followers collectively search for sequences with high probability.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose To test this approach, we evaluate DisCIPL on a variety of challenging constrained generation tasks. We find that DisCIPL enables a small Follower (Llama-3.2-1B) to match -- and sometimes outperform -- much larger models like GPT-4o and o1!

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose To understand why this approach is effective, it’s useful to think about DisCIPL as a programming toolkit for LMs that gives the Planner fine-grained control over the Follower.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose One particularly powerful pattern allows the Planner to dynamically inject information into the Follower's system prompt *mid-generation*. We call this “self-hinting.” Think of it like a generalized decoding-time calculator that can perform arbitrary Python computations.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose With these tools, we find that DisCIPL is able to solve a variety of hard search tasks like poetry composition, grant-writing, budgeting, and itinerary planning -- all using a 1B Llama model as Follower!

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Self-steering takeaways: ✅ LMs can write code to steer other LMs, even when they can't solve tasks themselves! ✅ Enables small LMs (e.g., Llama-1B) to perform like larger ones (e.g., GPT-4o and o1) ✅ Requires no finetuning and can be implemented automatically by existing LMs!

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose For more details, please see our paper:

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Thanks again to my incredible collaborators Josh Tenenbaum, @vmansinghka, @alexanderklew, @jacobandreas for providing expert meta-steering on this work!

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı

Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose @vmansinghka @alexanderklew @jacobandreas For those @iclr_conf in Singapore, I'll be giving a talk on this work at the VerifAI @ ICLR workshop on April 27! Look forward to seeing you there!

RTTS profil fotoğrafı

RTTS1 yıl önce

API testing of interfaces is critical to determine if they meet requirements for functionality, reliability, performance, and security. Check out RTTS - the automated testing experts since 1996. #API #testautomation #integrationtest

Benzer Videolar

Almost all animals sleep. Why don’t LMs? Introducing our new work on language model sleep. tl;dr : A periodic, recurrent “sleep” phase allows LMs to digest their context and transfer it into their weights, improving recall and reasoning on challenging tasks.

Almost all animals sleep. Why don’t LMs? Introducing our new work on language model sleep. tl;dr : A periodic, recurrent “sleep” phase allows LMs to digest their context and transfer it into their weights, improving recall and reasoning on challenging tasks.

Sangyun Lee

120,334 görüntüleme • 22 gün önce

🚀 How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produce a correct answer? Best-of-N (e.g., GRPO) and tree search share two limitations: 🔻 Verification signals are sparse 🔻 Candidates stay within the model's own distribution We introduce BES: Bidirectional Evolutionary Search — a search framework that couples forward candidate evolution with backward goal decomposition. ✅ Works for both post-training and inference.

🚀 How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produce a correct answer? Best-of-N (e.g., GRPO) and tree search share two limitations: 🔻 Verification signals are sparse 🔻 Candidates stay within the model's own distribution We introduce BES: Bidirectional Evolutionary Search — a search framework that couples forward candidate evolution with backward goal decomposition. ✅ Works for both post-training and inference.

Guowei Xu

241,750 görüntüleme • 22 gün önce

how imperfect makes their lms usts

how imperfect makes their lms usts

Kmate

131,018 görüntüleme • 6 ay önce

NEW CONTENT! Introducing the Audi RS3 LMS Gen2 TCR Audi Sport

NEW CONTENT! Introducing the Audi RS3 LMS Gen2 TCR Audi Sport

iRacing

34,931 görüntüleme • 3 ay önce

News: Possible Aberrant Custom LMS For characters like c00lkidd and 1x1x1x1 who have LMS with specific characters they will still keep those original LMS. But Alex does have an idea of a remix of Eternal Hope Eternal Fight using other LMS themes as a global LMS #Forsaken

News: Possible Aberrant Custom LMS For characters like c00lkidd and 1x1x1x1 who have LMS with specific characters they will still keep those original LMS. But Alex does have an idea of a remix of Eternal Hope Eternal Fight using other LMS themes as a global LMS #Forsaken

Forsaken Wiki (Fandom)

50,114 görüntüleme • 5 ay önce

Bolt's agent now runs Claude 3.7 Sonnet—a huge upgrade for coding, design & debugging! But that’s not all… We’re introducing Dynamic Reasoning, a first-of-its-kind feature that lets the AI choose how hard to “think” when tackling complex problems.

Bolt's agent now runs Claude 3.7 Sonnet—a huge upgrade for coding, design & debugging! But that’s not all… We’re introducing Dynamic Reasoning, a first-of-its-kind feature that lets the AI choose how hard to “think” when tackling complex problems.

bolt.new

70,671 görüntüleme • 1 yıl önce

I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMSI HATE

I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMS I HATE FAZBEAR FRIGHTS LMSI HATE

Mr. Baldimore

37,253 görüntüleme • 1 yıl önce

New work with Alec Radford and David Duvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:

New work with Alec Radford and David Duvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:

Nick Levine

1,187,845 görüntüleme • 1 ay önce

Azure v Two Time LMS

Azure v Two Time LMS

Animocacy

85,921 görüntüleme • 1 ay önce

Sharing our work at NeurIPS Conference on reasoning with EBMs! We learn an EBM over simple subproblems and combine EBMs at test-time to solve complex reasoning problems (3-SAT, graph coloring, crosswords). Generalizes well to complex 3-SAT / graph coloring/ N-queens problems.

Sharing our work at NeurIPS Conference on reasoning with EBMs! We learn an EBM over simple subproblems and combine EBMs at test-time to solve complex reasoning problems (3-SAT, graph coloring, crosswords). Generalizes well to complex 3-SAT / graph coloring/ N-queens problems.

Yilun Du

47,478 görüntüleme • 7 ay önce

Introducing a more powerful NotebookLM 🚀 Massive upgrades deliver agentic capabilities in chat, more advanced reasoning, and a suite of new output formats. Tackling complex, multi-step research problems has never been easier. Rolling out now to Google AI Ultra subscribers.

Introducing a more powerful NotebookLM 🚀 Massive upgrades deliver agentic capabilities in chat, more advanced reasoning, and a suite of new output formats. Tackling complex, multi-step research problems has never been easier. Rolling out now to Google AI Ultra subscribers.

NotebookLM

871,611 görüntüleme • 11 gün önce

My thoughts on Tuesday’s federal elections, and how American Muslims should contextualize their own problems within an Ummatic framework.

My thoughts on Tuesday’s federal elections, and how American Muslims should contextualize their own problems within an Ummatic framework.

Dr. Omar Suleiman

38,811 görüntüleme • 1 yıl önce

Language models today are (1) widely used in personalized contexts and (2) to build systems that interface with tools. Do they respect privacy when helping with daily tasks like emailing? Introducing PrivacyLens to evaluate if LMs know privacy norms in action at inference time!

Language models today are (1) widely used in personalized contexts and (2) to build systems that interface with tools. Do they respect privacy when helping with daily tasks like emailing? Introducing PrivacyLens to evaluate if LMs know privacy norms in action at inference time!

Yijia Shao

52,502 görüntüleme • 1 yıl önce

AI for Scientific Advancement Diana AI is transforming how physical things are made. Many scientific tools haven’t changed in decades, but test-time compute is now enabling new startups to solve complex problems in fields like chemistry, biology, and materials science.

AI for Scientific Advancement Diana AI is transforming how physical things are made. Many scientific tools haven’t changed in decades, but test-time compute is now enabling new startups to solve complex problems in fields like chemistry, biology, and materials science.

Y Combinator

17,689 görüntüleme • 1 yıl önce

4. A search agent with multi-step trip planning and reasoning

4. A search agent with multi-step trip planning and reasoning

Chief AI Officer

56,086 görüntüleme • 2 yıl önce

c00lkid vs 7n7 LMS Produced by @crim3z0ne

c00lkid vs 7n7 LMS Produced by @crim3z0ne

Maono

229,214 görüntüleme • 6 ay önce

#Forsaken #dieofdeath NEW DOD LMS IS SO COOL WHATT

#Forsaken #dieofdeath NEW DOD LMS IS SO COOL WHATT

cixors

50,240 görüntüleme • 7 ay önce

🆕Scaling Test Time Compute to Multi-Agent Civilizations, with Noam Brown We're excited to publish our full conversation with Noam Brown on the frontiers of the new reasoning paradigm at OpenAI! - first principles for starting the "Multi-Agents" team - what's not captured by the "System 1/System 2" analogy for inference time compute - how Ilya Sutskever convinced him that reasoning was closer than he thought - Deep Research is existence proof that RL generalizes beyond verifiable rewards - the relationship between AI for imperfect information games (like Poker, Stratego, Diplomacy) and reasoning Enjoy! on youtube, or wherever fine podcasts are sold.

🆕Scaling Test Time Compute to Multi-Agent Civilizations, with Noam Brown We're excited to publish our full conversation with Noam Brown on the frontiers of the new reasoning paradigm at OpenAI! - first principles for starting the "Multi-Agents" team - what's not captured by the "System 1/System 2" analogy for inference time compute - how Ilya Sutskever convinced him that reasoning was closer than he thought - Deep Research is existence proof that RL generalizes beyond verifiable rewards - the relationship between AI for imperfect information games (like Poker, Stratego, Diplomacy) and reasoning Enjoy! on youtube, or wherever fine podcasts are sold.

Latent.Space

105,905 görüntüleme • 11 ay önce

Introducing Masked Trajectory Modeling (MTM), a new general-purpose framework for sequential decision making. A single transformer trained with MTM can exhibit multiple capabilities by simply choosing different masking patterns at inference time. Accepted at ICML 2023. 🧵👇

Introducing Masked Trajectory Modeling (MTM), a new general-purpose framework for sequential decision making. A single transformer trained with MTM can exhibit multiple capabilities by simply choosing different masking patterns at inference time. Accepted at ICML 2023. 🧵👇

Philipp Wu

93,089 görüntüleme • 3 yıl önce

interesting lms choice

interesting lms choice

FISH

24,229 görüntüleme • 6 ay önce