Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Tackling complex problems with LMs requires search/planning, but how should test-time compute be structured? Introducing Self-Steering, a new meta-reasoning framework where LMs coordinate their own inference procedures by writing code!

20,315 Aufrufe • vor 1 Jahr •via X (Twitter)

21 Kommentare

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

Today’s AI models excel at math, science, and programming, but simultaneously struggle with much more basic problems. People like @karpathy & @kevinroose have used the term “jagged intelligence” to describe this discrepancy.

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose Much of this “jaggedness” arises especially for problems that require long-horizon search/planning.

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose For instance, consider this constrained generation prompt, which is manageable for most English speakers, but hard for even very capable LMs like GPT-4o.

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose One approach to these kinds of problems is to sample repeatedly from the LM until we get a valid generation.

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose As noted in recent work (e.g., Brown et al., 2024), this is a really simple way to scale performance

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose However, repeated sampling has some key drawbacks: ❌ Requires a verifier ❌ Cost scales with problem complexity ❌ Assumes the LM will eventually produce a valid sample (not always the case for complex tasks)

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose Another approach, recently popularized by models like o1 (OpenAI) and R1 (DeepSeek), is to generate extended chain-of-thought reasoning.

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose While these latest models can crack this toy problem, CoT at inference-time can also be slow, costly, and can still produce errors. Moreover, reasoning autoregressively means linearizing separate branches into one long “stream of search,” which forfeits parallelism.

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose Stepping back, one key observation is that even when LMs struggle to emulate the precise reasoning steps needed to solve a problem, they often excel at describing its abstract structure -- both *how to verify* solutions and *how to search* for them!

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose Motivated by this insight, we introduce DiSCIPL, a meta-reasoning approach that equips LMs with the ability to write recursive search procedures that guide LM inference, enabling new forms of verifiable and efficient reasoning.

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose In DisCIPL, a Planner LM writes an inference program that defines step-by-step computations to steer a population of Follower LMs.

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose Our approach combines the benefits of serial and parallel methods: the Planner ensures correctness by construction, while the Followers collectively search for sequences with high probability.

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose To test this approach, we evaluate DisCIPL on a variety of challenging constrained generation tasks. We find that DisCIPL enables a small Follower (Llama-3.2-1B) to match -- and sometimes outperform -- much larger models like GPT-4o and o1!

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose To understand why this approach is effective, it’s useful to think about DisCIPL as a programming toolkit for LMs that gives the Planner fine-grained control over the Follower.

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose One particularly powerful pattern allows the Planner to dynamically inject information into the Follower's system prompt *mid-generation*. We call this “self-hinting.” Think of it like a generalized decoding-time calculator that can perform arbitrary Python computations.

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose With these tools, we find that DisCIPL is able to solve a variety of hard search tasks like poetry composition, grant-writing, budgeting, and itinerary planning -- all using a 1B Llama model as Follower!

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose Self-steering takeaways: ✅ LMs can write code to steer other LMs, even when they can't solve tasks themselves! ✅ Enables small LMs (e.g., Llama-1B) to perform like larger ones (e.g., GPT-4o and o1) ✅ Requires no finetuning and can be implemented automatically by existing LMs!

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose For more details, please see our paper:

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose Thanks again to my incredible collaborators Josh Tenenbaum, @vmansinghka, @alexanderklew, @jacobandreas for providing expert meta-steering on this work!

Profilbild von Ġabe Ġrand @ ICLR 2025
Ġabe Ġrand @ ICLR 2025vor 1 Jahr

@kevinroose @vmansinghka @alexanderklew @jacobandreas For those @iclr_conf in Singapore, I'll be giving a talk on this work at the VerifAI @ ICLR workshop on April 27! Look forward to seeing you there!

Profilbild von RTTS
RTTSvor 1 Jahr

API testing of interfaces is critical to determine if they meet requirements for functionality, reliability, performance, and security. Check out RTTS - the automated testing experts since 1996. #API #testautomation #integrationtest

Ähnliche Videos