Zhiwen(Aaron) Fan's banner

Zhiwen(Aaron) Fan

@zhiwen_fan_ • 1,838 subscribers

Assistant Prof @ Texas A&M ECE @TAMU | World Model and other Spatial Foundation Models

Shorts

Discover the right 3D Geometric Foundation Model for your task—whether it’s stereo matching, multi-view depth estimation, video depth, pose estimation, semantic understanding, or novel view synthesis. Explore more insights in our #E3DBench #FoundationModel #3D #GaussianSplatting. Project Webpage:

Discover the right 3D Geometric Foundation Model for your task—whether it’s stereo matching, multi-view depth estimation, video depth, pose estimation, semantic understanding, or novel view synthesis. Explore more insights in our #E3DBench #FoundationModel #3D #GaussianSplatting. Project Webpage:

30,010 просмотров

🚀 #NVlabs #InstantSplat now supports high-quality sparse-view surface reconstruction in just seconds! 📸 Build your scene with just 3 images. It's effortless, fast, and ready for you to explore. 💻 Code is now available

🚀 #NVlabs #InstantSplat now supports high-quality sparse-view surface reconstruction in just seconds! 📸 Build your scene with just 3 images. It's effortless, fast, and ready for you to explore. 💻 Code is now available

22,939 просмотров

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

InstantSplat++ is now open source. It is a lightweight library that connects foundation models (VGGT, MASt3R, MAP-Anything, etc.) with the Gaussian splatting family. Given uncalibrated images, it optimizes a 3D scene in a few seconds. Try the demo and code here:

InstantSplat++ is now open source. It is a lightweight library that connects foundation models (VGGT, MASt3R, MAP-Anything, etc.) with the Gaussian splatting family. Given uncalibrated images, it optimizes a 3D scene in a few seconds. Try the demo and code here:

Zhiwen(Aaron) Fan

31,864 просмотров • 4 месяцев назад

Speeding your view synthesis(<40s) with #InstantSplat! Our large-scale, pose-free method trains in just 37 seconds from sparse views—no #COLMAP, no intrinsics needed. Achieving nearly 30dB test PSNR with just 12 images, New standard in #NVS and new training efficiency. Project page 👉 Paper 📷:

Speeding your view synthesis(<40s) with #InstantSplat! Our large-scale, pose-free method trains in just 37 seconds from sparse views—no #COLMAP, no intrinsics needed. Achieving nearly 30dB test PSNR with just 12 images, New standard in #NVS and new training efficiency. Project page 👉 Paper 📷:

Zhiwen(Aaron) Fan

108,763 просмотров • 2 лет назад

🚀 Just dropped the code: #InstantSplat! Reconstruct your 3D scenes in seconds. Get it now at and start building! ✨ #3DVision #AI #ComputerVision #NVlabs #OpenSource

🚀 Just dropped the code: #InstantSplat! Reconstruct your 3D scenes in seconds. Get it now at and start building! ✨ #3DVision #AI #ComputerVision #NVlabs #OpenSource

Zhiwen(Aaron) Fan

89,142 просмотров • 1 год назад

🚀 Our NeurIPS '24 work, Large Spatial Model (LSM), is here! LSM performs semantic 3D reconstruction in just 0.1s, processing unposed data via feed-forward 3D reconstruction. 👉It leverages large-scale 3D datasets with minimal annotations, defining a 3D latent space. We are continuously exploring how this explicit 3D representation can further enhance reasoning and robotic learning. 🔗 Try our online Gradio demo with your own data at #NeurIPS2024 #3DReconstruction

🚀 Our NeurIPS '24 work, Large Spatial Model (LSM), is here! LSM performs semantic 3D reconstruction in just 0.1s, processing unposed data via feed-forward 3D reconstruction. 👉It leverages large-scale 3D datasets with minimal annotations, defining a 3D latent space. We are continuously exploring how this explicit 3D representation can further enhance reasoning and robotic learning. 🔗 Try our online Gradio demo with your own data at #NeurIPS2024 #3DReconstruction

Zhiwen(Aaron) Fan

43,871 просмотров • 1 год назад

What happens when VLMs meet 3D foundation models? See VLM-3R (CVPR 2026). VLM-3R links a vision-language model (e.g., Qwen) with 3D geometric foundation models (e.g., CUT3R) at metric scale. Given an uncalibrated video, it moves beyond pixels to perceive and reason in 3D space. Code (open source):

What happens when VLMs meet 3D foundation models? See VLM-3R (CVPR 2026). VLM-3R links a vision-language model (e.g., Qwen) with 3D geometric foundation models (e.g., CUT3R) at metric scale. Given an uncalibrated video, it moves beyond pixels to perceive and reason in 3D space. Code (open source):

Zhiwen(Aaron) Fan

10,641 просмотров • 4 месяцев назад

We present VLM-3R: a Vision-Language Model capable of 3D spatial reasoning from monocular video, grounding visual cues, geometry, and camera motion. ✅ No depth sensor ✅ No pre-built 3D maps ✅ End-to-end spatial + temporal reasoning 🔗 Code & benchmark: #VLM #3DVision #LLMs

We present VLM-3R: a Vision-Language Model capable of 3D spatial reasoning from monocular video, grounding visual cues, geometry, and camera motion. ✅ No depth sensor ✅ No pre-built 3D maps ✅ End-to-end spatial + temporal reasoning 🔗 Code & benchmark: #VLM #3DVision #LLMs

Zhiwen(Aaron) Fan

14,895 просмотров • 1 год назад

Больше нет контента для загрузки