Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies Contributions: 1) We propose ImmerseGen, a novel agent-guided 3D environment generation framework. It uses simplified geometric proxies with alpha-textured meshes to produce compact, photorealistic worlds ready for real-time mobile VR rendering. 2) We propose a novel RGBA texturing paradigm. It first...

14,225 görüntüleme • 1 yıl önce •via X (Twitter)

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

MrNeRF

52,801 görüntüleme • 1 yıl önce

3D Gaussian Splatting for Real-Time Radiance Field Rendering paper page: Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.

AK

633,428 görüntüleme • 2 yıl önce

📢 Our lab has been exploring 3D world models for years — and we’re thrilled to share **PhysTwin**: a milestone that reconstructs object appearance, geometry, and dynamics from just a few seconds of interaction! Led by the amazing Hanxiao Jiang 👉 PhysTwin combines **Gaussian splatting** with **inverse dynamics optimization** based on simple **spring-mass** systems. ⚙️ The result? Real-time, action-conditioned 3D video prediction under novel interactions (i.e., 3D world models). 🔑 A few key takeaways: 1. Having the right structure (e.g., particles/masses) helps navigate the trade-off between sample efficiency, generalization, and broad applicability. 2. Visual foundation models (VFMs) have matured to the point where they can provide rich supervision for world modeling (e.g., tracking, shape completion). 3. Beyond VFMs, many crucial components have come together in recent years: Gaussian splats for rendering, NVIDIA Warp for high-performance simulation, and scene/asset generation from a wide range of labs and companies. The future of 3D world models is looking bright! ✨ 4. The resulting digital twin supports a wide range of downstream applications—especially in data generation and policy evaluation, thanks to its realistic rendering and simulation capabilities. 🎥 All code and data to reproduce the results, along with interactive demos, are available on the website. Check the following visualizations of: (1) observations, (2) reconstructed state/actions, (3) interactive digital twins, and (4) the overlays between real-world robot teleoperation and our model’s open-loop predictions.

Yunzhu Li

25,279 görüntüleme • 1 yıl önce

Introducing Kaleido💮 from AI at Meta — a universal generative neural rendering engine for photorealistic, unified object and scene view synthesis. Kaleido is built on a simple but powerful design philosophy: 3D perception is a form of visual common sense. Following this idea, we formulate rendering purely as a sequence-to-sequence generation problem, successfully unifying neural rendering with the architecture principles behind modern language and video models. Unlike traditional neural rendering methods, Kaleido learns 3D purely in a data-driven way, without explicit 3D representations or structures. It acquires spatial understanding directly through large-scale video pretraining, then multi-view 3D data finetuning, inspired by how LLMs acquire textual common sense from large corpora before specialising in domains like coding. Through extensive ablations, we progressively modernised the architecture design and training strategies and tackled key scaling challenges in sequence-to-sequence generative rendering, arriving at a design that’s simple, versatile, and scalable. Kaleido significantly outperforms prior generative models in few-view settings, and remarkably is the first zero-shot generative method matches InstantNGP-level rendering quality in multi-view settings. We view Kaleido also as an alternative step towards world modeling that flexibly spans a spectrum of “realities": with many views, it faithfully reconstructs grounded reality; with fewer views, it imagines plausible unseen details. 🔗 Explore more results and paper:

Shikun Liu

22,134 görüntüleme • 8 ay önce