Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Tackling complex problems with LMs requires search/planning, but how should test-time compute be structured? Introducing Self-Steering, a new meta-reasoning framework where LMs coordinate their own inference procedures by writing code!

20,315 görüntüleme • 1 yıl önce •via X (Twitter)

21 Yorum

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

Today’s AI models excel at math, science, and programming, but simultaneously struggle with much more basic problems. People like @karpathy & @kevinroose have used the term “jagged intelligence” to describe this discrepancy.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Much of this “jaggedness” arises especially for problems that require long-horizon search/planning.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose For instance, consider this constrained generation prompt, which is manageable for most English speakers, but hard for even very capable LMs like GPT-4o.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose One approach to these kinds of problems is to sample repeatedly from the LM until we get a valid generation.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose As noted in recent work (e.g., Brown et al., 2024), this is a really simple way to scale performance

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose However, repeated sampling has some key drawbacks: ❌ Requires a verifier ❌ Cost scales with problem complexity ❌ Assumes the LM will eventually produce a valid sample (not always the case for complex tasks)

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Another approach, recently popularized by models like o1 (OpenAI) and R1 (DeepSeek), is to generate extended chain-of-thought reasoning.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose While these latest models can crack this toy problem, CoT at inference-time can also be slow, costly, and can still produce errors. Moreover, reasoning autoregressively means linearizing separate branches into one long “stream of search,” which forfeits parallelism.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Stepping back, one key observation is that even when LMs struggle to emulate the precise reasoning steps needed to solve a problem, they often excel at describing its abstract structure -- both *how to verify* solutions and *how to search* for them!

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Motivated by this insight, we introduce DiSCIPL, a meta-reasoning approach that equips LMs with the ability to write recursive search procedures that guide LM inference, enabling new forms of verifiable and efficient reasoning.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose In DisCIPL, a Planner LM writes an inference program that defines step-by-step computations to steer a population of Follower LMs.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Our approach combines the benefits of serial and parallel methods: the Planner ensures correctness by construction, while the Followers collectively search for sequences with high probability.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose To test this approach, we evaluate DisCIPL on a variety of challenging constrained generation tasks. We find that DisCIPL enables a small Follower (Llama-3.2-1B) to match -- and sometimes outperform -- much larger models like GPT-4o and o1!

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose To understand why this approach is effective, it’s useful to think about DisCIPL as a programming toolkit for LMs that gives the Planner fine-grained control over the Follower.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose One particularly powerful pattern allows the Planner to dynamically inject information into the Follower's system prompt *mid-generation*. We call this “self-hinting.” Think of it like a generalized decoding-time calculator that can perform arbitrary Python computations.

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose With these tools, we find that DisCIPL is able to solve a variety of hard search tasks like poetry composition, grant-writing, budgeting, and itinerary planning -- all using a 1B Llama model as Follower!

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Self-steering takeaways: ✅ LMs can write code to steer other LMs, even when they can't solve tasks themselves! ✅ Enables small LMs (e.g., Llama-1B) to perform like larger ones (e.g., GPT-4o and o1) ✅ Requires no finetuning and can be implemented automatically by existing LMs!

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose For more details, please see our paper:

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose Thanks again to my incredible collaborators Josh Tenenbaum, @vmansinghka, @alexanderklew, @jacobandreas for providing expert meta-steering on this work!

Ġabe Ġrand @ ICLR 2025 profil fotoğrafı
Ġabe Ġrand @ ICLR 20251 yıl önce

@kevinroose @vmansinghka @alexanderklew @jacobandreas For those @iclr_conf in Singapore, I'll be giving a talk on this work at the VerifAI @ ICLR workshop on April 27! Look forward to seeing you there!

RTTS profil fotoğrafı
RTTS1 yıl önce

API testing of interfaces is critical to determine if they meet requirements for functionality, reliability, performance, and security. Check out RTTS - the automated testing experts since 1996. #API #testautomation #integrationtest

Benzer Videolar