Nick Jiang's banner
Nick Jiang's profile picture

Nick Jiang

@nickhjiang1,182 subscribers

probing machines @stanford

Shorts

New work: The Value Axis 🎯 How do LLMs choose which path to take mid-task? We find they internally track the chance of reaching their goal along a linear axis, akin to a value function in RL. We show it modulates confidence in math & coding and can be reshaped with DPO and SFT.

New work: The Value Axis 🎯 How do LLMs choose which path to take mid-task? We find they internally track the chance of reaching their goal along a linear axis, akin to a value function in RL. We show it modulates confidence in math & coding and can be reshaped with DPO and SFT.

24,882 次观看