Arip's banner
Arip's profile picture

Arip

@machinestein1,374 subscribers

Shorts

ICML 2026: Latent Reasoning in TRMs is Secretly a Policy Improvement Operator Why does recursive reasoning, especially latent reasoning, actually work? The theory is still young, and even mechanistic explanations are limited. We close part of this gap by showing that latent reasoning is secretly doing policy improvement. Each recursion pushes the model steadily toward the target. Based on this view, we propose an algorithm that boosts learning and inference efficiency by up to 18x.

ICML 2026: Latent Reasoning in TRMs is Secretly a Policy Improvement Operator Why does recursive reasoning, especially latent reasoning, actually work? The theory is still young, and even mechanistic explanations are limited. We close part of this gap by showing that latent reasoning is secretly doing policy improvement. Each recursion pushes the model steadily toward the target. Based on this view, we propose an algorithm that boosts learning and inference efficiency by up to 18x.

23,733 görüntüleme