正在加载视频...
视频加载失败
Transformer-based neural networks achieve impressive performance on coding, math & reasoning tasks that require keeping track of variables and their values. But how can they do that without explicit memory? 📄 Our new ICML paper investigates this in a synthetic setting! 🧵 1/13
13 条评论

First things first – make sure check out our companion website 🔬 Variable Scope ( to get a full explanation of the project and follow along experiments with many interactive visualizations. Now, on to the thread! 2/13

Variable binding – the ability to associate abstract variables with values – is fundamental to computation & cognition. Classical architectures implement this through addressable memory, but neural nets like Transformers lack such explicit mechanisms. Can they learn it? 3/13

We trained a Transformer from scratch on a variable dereferencing task. Given symbolic programs containing chains of assignments (a=5, b=a, etc) plus irrelevant distractors, the model must trace the correct chain (up to 4 assignments deep) to find a queried variable's value. 4/13

We observe three distinct phases in the model's learning trajectory, with sharp phase transitions characteristic of a "grokking" dynamic: 1️⃣ Random numerical prediction (≈12% test set accuracy) 2️⃣ Shallow heuristics (≈56%) 3️⃣ General solution that solves the task (>99.9%) 5/13

In phase 1️⃣, the model only learns to predict random numbers. In phase 2️⃣, it learns to predict values from the first few lines of programs, which works surprisingly well for longer chains, but fails otherwise. In phase 3️⃣, it learns a systematic mechanism that generalizes. 6/13

How does the general mechanism learned in the final phase actually work? To find out, we used a causal intervention method called activation patching with counterfactual inputs to trace information flow across layers and identify causally responsible components. 7/13

Patching the residual stream (the main information pathway between layers) shows that information about the correct value is dynamically routed across layers at token positions corresponding to each step of the query variable's assignment chain. 8/13

Patching individual attention heads reveals how they specialize and coordinate to route information: early heads handle the first hop in the assignment chain, mid-layer heads propagate subsequent hops, and late heads aggregate the answer at query position. 9/13

How is that possible? The residual stream acts as a kind of addressable memory. We find that the model learns to dedicate separate subspaces of the residual stream to encode variables names and numerical constants. Causal interventions confirm their functional role. 10/13

So, the model learns a circuit that encodes variables & values in distinct subspaces. How does it learn? Interestingly, the circuit does *not* replace earlier heuristics – it's built on top! Heuristics are still used when they work & the circuit activates when they fail. 11/13

To sum up: 1. Transformers can learn variable binding via emergent mechanisms, w/o explicit symbolic machinery 2. Learning is cumulative, with a general mechanism learned on top of heuristics. This challenges traditional narratives about grokking 12/13

See full paper for more! I'm thrilled to get this one out. It's a passion project that's been a long time in the making with my wonderful former student Yiwei Wu and the fantastic Atticus Geiger. Thanks to @cosmos_inst for funding hosting costs for 13/13

AI is transforming healthcare! A KSM-led study shows AI can detect Celiac disease 4 years earlier @TalPatalon @MedPredict


