Vai Viswanathan's banner
Vai Viswanathan's profile picture

Vai Viswanathan

@vai_viswanathan1,677 subscribers

building w world models | 32,000+ hours of open data @ https://t.co/OGSVxn1iWG exploring @southpkcommons computer vision @ https://t.co/QJrkahHeL6 founder @canaryaero

Shorts

For years Yann LeCun has argued that generative video models can't truly learn physics. DeepMind's Physics-IQ benchmark proved him right, reporting "a striking lack of physical understanding in current generative video models". The best one scored just 29.5%. This CVPR 2026 paper finds the fix in LeCun's own playbook. "Inference-time Physics Alignment" doesn't retrain the generator. It steers a video model's denoising at inference using a reward from VJEPA-2, LeCun's Joint Embedding Predictive Architecture. - Won first place in the ICCV 2025 PhysicsIQ Challenge with a 62.64% score, beating the previous state of the art by 7.42%. - Repurpose VJEPA-2's "surprise" score as a reward, then search and rank multiple candidate denoising trajectories at test time. - Why it matters: It's a neat vindication of LeCun's thesis. The pixel-prediction generator alone doesn't get physics, but the JEPA world model approach he champions supplies the physics the video model lacks.

For years Yann LeCun has argued that generative video models can't truly learn physics. DeepMind's Physics-IQ benchmark proved him right, reporting "a striking lack of physical understanding in current generative video models". The best one scored just 29.5%. This CVPR 2026 paper finds the fix in LeCun's own playbook. "Inference-time Physics Alignment" doesn't retrain the generator. It steers a video model's denoising at inference using a reward from VJEPA-2, LeCun's Joint Embedding Predictive Architecture. - Won first place in the ICCV 2025 PhysicsIQ Challenge with a 62.64% score, beating the previous state of the art by 7.42%. - Repurpose VJEPA-2's "surprise" score as a reward, then search and rank multiple candidate denoising trajectories at test time. - Why it matters: It's a neat vindication of LeCun's thesis. The pixel-prediction generator alone doesn't get physics, but the JEPA world model approach he champions supplies the physics the video model lacks.

44,077 görüntüleme

Videos

Daha fazla içerik yok.