Haian Jin's banner

Haian Jin

@Haian_Jin • 1,195 subscribers

CS Ph.D. Student at @Cornell University; Interested in Vision and ML

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Spatial reconstruction is a long-context problem: real scenes come with hundreds of images. But O(N²) transformer-based models don’t scale efficiently. Introducing: 🤐ZipMap (CVPR ’26): Linear-Time, Stateful 3D Reconstruction via Test-Time Training (TTT). ZipMap “zips” a large image collection into an implicit TTT scene state in a single linear-time operation. The state will then be decoded into spatial outputs, and can be queried efficiently for novel-view geometry and appearance (~100 FPS) ZipMap is not only much faster (>20× faster than VGGT), but also matches or surpasses the accuracy of all SOTA models.

Spatial reconstruction is a long-context problem: real scenes come with hundreds of images. But O(N²) transformer-based models don’t scale efficiently. Introducing: 🤐ZipMap (CVPR ’26): Linear-Time, Stateful 3D Reconstruction via Test-Time Training (TTT). ZipMap “zips” a large image collection into an implicit TTT scene state in a single linear-time operation. The state will then be decoded into spatial outputs, and can be queried efficiently for novel-view geometry and appearance (~100 FPS) ZipMap is not only much faster (>20× faster than VGGT), but also matches or surpasses the accuracy of all SOTA models.

78,899 просмотров • 4 месяцев назад

Novel view synthesis has long been a core challenge in 3D vision. But how much 3D inductive bias is truly needed? —Surprisingly, very little! Introducing "LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias"—a fully transformer-based approach that enables scalable, generalizable, and fully data-driven novel view synthesis, from sparse posed inputs. 🧵(1/6) Project Page:

Novel view synthesis has long been a core challenge in 3D vision. But how much 3D inductive bias is truly needed? —Surprisingly, very little! Introducing "LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias"—a fully transformer-based approach that enables scalable, generalizable, and fully data-driven novel view synthesis, from sparse posed inputs. 🧵(1/6) Project Page:

114,792 просмотров • 1 год назад

Больше нет контента для загрузки