
Xingang Pan
@XingangP • 3,057 subscribers
Assistant Professor at Nanyang Technological University @NTUsg @MMLabNTU - Computer Vision, Deep Learning, Computer Graphics
Shorts
Videos

Introducing StoryMem — a memory-augmented framework for multi-shot long video storytelling. StoryMem carefully injects compact memory into the generation process with minimal overhead, enabling: • Cross-shot consistency • Smooth transitions • Narrative coherence across minutes-long videos Awesome work by Kaiwen Kaiwen Zhang (Kevin) Project: arXiv: Code:
Xingang Pan15,427 次观看 • 4 个月前

Can video generative models exhibit visuospatial intelligence? 🤔 Introducing Video4Spatial — a video-only framework that tackles spatial tasks. With just video context, our model can: 🔍 Ground objects by planning geometry-consistent paths 📸 Follow camera-pose instructions for scene navigation 🌐 Generalize to long contexts & unseen outdoor scenes A step toward video models as visual-spatial reasoners. Project: arXiv:
Xingang Pan15,794 次观看 • 5 个月前
没有更多内容可加载