
Xingang Pan
@XingangP • 3,057 subscribers
Assistant Professor at Nanyang Technological University @NTUsg @MMLabNTU - Computer Vision, Deep Learning, Computer Graphics
Shorts
Videos

Introducing StoryMem — a memory-augmented framework for multi-shot long video storytelling. StoryMem carefully injects compact memory into the generation process with minimal overhead, enabling: • Cross-shot consistency • Smooth transitions • Narrative coherence across minutes-long videos Awesome work by Kaiwen Kaiwen Zhang (Kevin) Project: arXiv: Code:
Xingang Pan15,427 views • 4 months ago

Can video generative models exhibit visuospatial intelligence? 🤔 Introducing Video4Spatial — a video-only framework that tackles spatial tasks. With just video context, our model can: 🔍 Ground objects by planning geometry-consistent paths 📸 Follow camera-pose instructions for scene navigation 🌐 Generalize to long contexts & unseen outdoor scenes A step toward video models as visual-spatial reasoners. Project: arXiv:
Xingang Pan15,794 views • 5 months ago

Synthesizing worlds with video diffusion models is often inconsistent — moving the camera back and forth leads to different scenes. We propose 🌐𝗪𝗼𝗿𝗹𝗱𝗠𝗲𝗺, a memory-based approach that ensures consistent world simulation without relying on explicit 3D reconstruction.
Xingang Pan19,413 views • 1 year ago
No more content to load