Xingang Pan's banner

Xingang Pan

@XingangP • 3,057 subscribers

Assistant Professor at Nanyang Technological University @NTUsg @MMLabNTU - Computer Vision, Deep Learning, Computer Graphics

Shorts

Diffusion models are sensitive to small changes in the input noise. We introduce Alias-Free Latent Diffusion Models (𝗔𝗙-𝗟𝗗𝗠) at #CVPR2025. It achieves shift-equivariance and generates consistent outputs. Project: arXiv:

Diffusion models are sensitive to small changes in the input noise. We introduce Alias-Free Latent Diffusion Models (𝗔𝗙-𝗟𝗗𝗠) at #CVPR2025. It achieves shift-equivariance and generates consistent outputs. Project: arXiv:

42,538 次观看

Introducing 📦𝗔𝗿𝘁𝗶𝗟𝗮𝘁𝗲𝗻𝘁🔧 (SIGGRAPH Asia 2025) — a high-quality 3D diffusion model that explicitly models object articulation, paving the way for richer, more realistic assets in embodied AI and simulation: – Generates fully articulated 3D objects – Physically plausible joints & motion – High-fidelity 3D Gaussian appearance – Supports generation from a single real image arXiv: Project: Code (coming soon):

Introducing 📦𝗔𝗿𝘁𝗶𝗟𝗮𝘁𝗲𝗻𝘁🔧 (SIGGRAPH Asia 2025) — a high-quality 3D diffusion model that explicitly models object articulation, paving the way for richer, more realistic assets in embodied AI and simulation: – Generates fully articulated 3D objects – Physically plausible joints & motion – High-fidelity 3D Gaussian appearance – Supports generation from a single real image arXiv: Project: Code (coming soon):

11,517 次观看

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Introducing 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐀𝐧𝐲𝐭𝐡𝐢𝐧𝐠, a new 3D generative model with two key properties: - A structured point-cloud latent space enabling flexible editing! - Support multi-modal conditions, e.g., point cloud, text, single/multi-view images arXiv:

Introducing 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐀𝐧𝐲𝐭𝐡𝐢𝐧𝐠, a new 3D generative model with two key properties: - A structured point-cloud latent space enabling flexible editing! - Support multi-modal conditions, e.g., point cloud, text, single/multi-view images arXiv:

55,847 次观看 • 1 年前

Introducing StoryMem — a memory-augmented framework for multi-shot long video storytelling. StoryMem carefully injects compact memory into the generation process with minimal overhead, enabling: • Cross-shot consistency • Smooth transitions • Narrative coherence across minutes-long videos Awesome work by Kaiwen Kaiwen Zhang (Kevin) Project: arXiv: Code:

Introducing StoryMem — a memory-augmented framework for multi-shot long video storytelling. StoryMem carefully injects compact memory into the generation process with minimal overhead, enabling: • Cross-shot consistency • Smooth transitions • Narrative coherence across minutes-long videos Awesome work by Kaiwen Kaiwen Zhang (Kevin) Project: arXiv: Code:

16,474 次观看 • 6 个月前

Can video generative models exhibit visuospatial intelligence? 🤔 Introducing Video4Spatial — a video-only framework that tackles spatial tasks. With just video context, our model can: 🔍 Ground objects by planning geometry-consistent paths 📸 Follow camera-pose instructions for scene navigation 🌐 Generalize to long contexts & unseen outdoor scenes A step toward video models as visual-spatial reasoners. Project: arXiv:

Can video generative models exhibit visuospatial intelligence? 🤔 Introducing Video4Spatial — a video-only framework that tackles spatial tasks. With just video context, our model can: 🔍 Ground objects by planning geometry-consistent paths 📸 Follow camera-pose instructions for scene navigation 🌐 Generalize to long contexts & unseen outdoor scenes A step toward video models as visual-spatial reasoners. Project: arXiv:

15,931 次观看 • 7 个月前

Introducing 𝐒𝐀𝐑𝟑𝐃, which tokenizes 3D objects into multiscale tokens and generates 3D objects by autoregressive next-scale prediction. 𝐒𝐀𝐑𝟑𝐃 enables fast 3D generation and comprehensive 3D understanding. arXiv: Project:

Introducing 𝐒𝐀𝐑𝟑𝐃, which tokenizes 3D objects into multiscale tokens and generates 3D objects by autoregressive next-scale prediction. 𝐒𝐀𝐑𝟑𝐃 enables fast 3D generation and comprehensive 3D understanding. arXiv: Project:

28,248 次观看 • 1 年前

Synthesizing worlds with video diffusion models is often inconsistent — moving the camera back and forth leads to different scenes. We propose 🌐𝗪𝗼𝗿𝗹𝗱𝗠𝗲𝗺, a memory-based approach that ensures consistent world simulation without relying on explicit 3D reconstruction.

Synthesizing worlds with video diffusion models is often inconsistent — moving the camera back and forth leads to different scenes. We propose 🌐𝗪𝗼𝗿𝗹𝗱𝗠𝗲𝗺, a memory-based approach that ensures consistent world simulation without relying on explicit 3D reconstruction.

19,413 次观看 • 1 年前

没有更多内容可加载