Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies Contributions: 1) We propose ImmerseGen, a novel agent-guided 3D environment generation framework. It uses simplified geometric proxies with alpha-textured meshes to produce compact, photorealistic worlds ready for real-time mobile VR rendering. 2) We propose a novel RGBA texturing paradigm. It first... synthesizes 8K terrain textures using a geometry-conditioned panorama generator via user-centric mapping, and then directly generates alpha-textured proxy assets, avoiding fidelity loss typically resulting from mesh decimation. 3) To automate scene creation from user prompts, we introduce VLM-based modeling agents equipped with a novel grid-based semantic analysis. This enables 3D spatial reasoning from 2D observations and ensures accurate asset placement. ImmerseGen further enhances immersion with dynamic effects and ambient audio for a multisensory experience. 4) Experiments on multiple scene-generation scenarios and live mobile VR applications show that ImmerseGen outperforms previous methods in visual quality, realism, spatial coherence, and rendering efficiency for immersive real-time VR experiences.show more

MrNeRF

13,428 subscribers

14,225 görüntüleme • 1 yıl önce •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

ImmerseGen Agent-Guided Immersive World Generation with Alpha-Textured Proxies

ImmerseGen Agent-Guided Immersive World Generation with Alpha-Textured Proxies

AK

42,848 görüntüleme • 1 yıl önce

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion TL;DR: Create 3/4DGS from Video Diffusion Note: Some first inference code released (not all yet). Contributions (cited): • We present DimensionX, a novel framework for generating photorealistic 3D and 4D scenes from only a single image using controllable video diffusion. • We propose ST-Director, which decouples the spatial and temporal priors in video diffusion models by learning (spatial and temporal) dimension-aware modules with our curated datasets. We further enhance the hybriddimension control with a training-free composition approach according to the essence of video diffusion denoising process. • To bridge the gap between video diffusion and real-world scenes, we design a trajectory-aware mechanism for 3D generation and an identity-preserving denoising approach for 4D generation, enabling more realistic and controllable scene synthesis. • Extensive experiments manifest that our DimensionX delivers superior performance in video, 3D, and 4D generation compared with baseline methods.

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion TL;DR: Create 3/4DGS from Video Diffusion Note: Some first inference code released (not all yet). Contributions (cited): • We present DimensionX, a novel framework for generating photorealistic 3D and 4D scenes from only a single image using controllable video diffusion. • We propose ST-Director, which decouples the spatial and temporal priors in video diffusion models by learning (spatial and temporal) dimension-aware modules with our curated datasets. We further enhance the hybriddimension control with a training-free composition approach according to the essence of video diffusion denoising process. • To bridge the gap between video diffusion and real-world scenes, we design a trajectory-aware mechanism for 3D generation and an identity-preserving denoising approach for 4D generation, enabling more realistic and controllable scene synthesis. • Extensive experiments manifest that our DimensionX delivers superior performance in video, 3D, and 4D generation compared with baseline methods.

MrNeRF

17,037 görüntüleme • 1 yıl önce

🚨 TRELLIS.2 is now live on fal! 🎯 Image-to-3D model producing up to 1536³ PBR textured assets 🎨 Handles arbitrary topology with rich PBR textures (Base Color, Metallic, Roughness, Alpha) ⚡ 16× spatial compression for efficient, scalable, high-fidelity asset generation

🚨 TRELLIS.2 is now live on fal! 🎯 Image-to-3D model producing up to 1536³ PBR textured assets 🎨 Handles arbitrary topology with rich PBR textures (Base Color, Metallic, Roughness, Alpha) ⚡ 16× spatial compression for efficient, scalable, high-fidelity asset generation

fal

32,588 görüntüleme • 6 ay önce

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

MrNeRF

52,801 görüntüleme • 1 yıl önce

3D Gaussian Splatting for Real-Time Radiance Field Rendering paper page: Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.

3D Gaussian Splatting for Real-Time Radiance Field Rendering paper page: Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.

AK

633,428 görüntüleme • 2 yıl önce

GSTAR: Gaussian Surface Tracking and Reconstruction Contributions: • A new framework for tracking and reconstructing dynamic scenes, combining 3D Gaussians and meshes to effectively manage changes in topology. • A method for Gaussian unbinding and surface re-meshing, allowing for the generation of new surfaces as topologies evolve. • A method for handling large or fast deformations of surfaces between frames using scene flow warping. Abstract (excerpt): However, tracking dynamic surfaces with 3D Gaussians remains challenging due to complex topology changes, such as surfaces appearing, disappearing, or splitting. To address these challenges, we propose GSTAR, a novel method that achieves photo-realistic rendering, accurate surface reconstruction, and reliable 3D tracking for general dynamic scenes with changing topology. Given multi-view captures as input, GSTAR binds Gaussians to mesh faces to represent dynamic objects. For surfaces with consistent topology, GSTAR maintains the mesh topology and tracks the meshes using Gaussians.

GSTAR: Gaussian Surface Tracking and Reconstruction Contributions: • A new framework for tracking and reconstructing dynamic scenes, combining 3D Gaussians and meshes to effectively manage changes in topology. • A method for Gaussian unbinding and surface re-meshing, allowing for the generation of new surfaces as topologies evolve. • A method for handling large or fast deformations of surfaces between frames using scene flow warping. Abstract (excerpt): However, tracking dynamic surfaces with 3D Gaussians remains challenging due to complex topology changes, such as surfaces appearing, disappearing, or splitting. To address these challenges, we propose GSTAR, a novel method that achieves photo-realistic rendering, accurate surface reconstruction, and reliable 3D tracking for general dynamic scenes with changing topology. Given multi-view captures as input, GSTAR binds Gaussians to mesh faces to represent dynamic objects. For surfaces with consistent topology, GSTAR maintains the mesh topology and tracks the meshes using Gaussians.

MrNeRF

22,698 görüntüleme • 1 yıl önce

DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery Abstract: Drones have become essential tools for reconstructing wild scenes due to their outstanding maneuverability. Recent advances in radiance field methods have achieved remarkable rendering quality, providing a new avenue for 3D reconstruction from drone imagery. However, dynamic distractors in wild environments challenge the static scene assumption in radiance fields, while limited view constraints hinder the accurate capture of underlying scene geometry. To address these challenges, we introduce DroneSplat, a novel framework designed for robust 3D reconstruction from in-the-wild drone imagery. Our method adaptively adjusts masking thresholds by integrating local-global segmentation heuristics with statistical approaches, enabling precise identification and elimination of dynamic distractors in static scenes. We enhance 3D Gaussian Splatting with multi-view stereo predictions and a voxel-guided optimization strategy, supporting high-quality rendering under limited view constraints. For comprehensive evaluation, we provide a drone-captured 3D reconstruction dataset encompassing both dynamic and static scenes. Extensive experiments demonstrate that DroneSplat outperforms both 3DGS and NeRF baselines in handling in-the-wild drone imagery.

DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery Abstract: Drones have become essential tools for reconstructing wild scenes due to their outstanding maneuverability. Recent advances in radiance field methods have achieved remarkable rendering quality, providing a new avenue for 3D reconstruction from drone imagery. However, dynamic distractors in wild environments challenge the static scene assumption in radiance fields, while limited view constraints hinder the accurate capture of underlying scene geometry. To address these challenges, we introduce DroneSplat, a novel framework designed for robust 3D reconstruction from in-the-wild drone imagery. Our method adaptively adjusts masking thresholds by integrating local-global segmentation heuristics with statistical approaches, enabling precise identification and elimination of dynamic distractors in static scenes. We enhance 3D Gaussian Splatting with multi-view stereo predictions and a voxel-guided optimization strategy, supporting high-quality rendering under limited view constraints. For comprehensive evaluation, we provide a drone-captured 3D reconstruction dataset encompassing both dynamic and static scenes. Extensive experiments demonstrate that DroneSplat outperforms both 3DGS and NeRF baselines in handling in-the-wild drone imagery.

MrNeRF

21,346 görüntüleme • 1 yıl önce

WeatherEdit: Controllable Weather Editing with 4D Gaussian Field Contributions: 1. Based on our analysis of weather editing characteristics, we introduce WeatherEdit, a comprehensive and efficient framework for realistic and controllable weather generation. Compared with existing methods that focus on either background editing or static weather effects, a progressive 2D-to-4D transformation process in WeatherEdit enhances adaptability across a wider range of scenarios. 2. We introduce an all-in-one adapter to enable a diffusion model for multi-weather (snowy, rainy, and fog) synthesis, along with a Temporal-View attention to ensure consistent editing across multi-frame and multi-view. 3. We design a 4D Gaussian field for weather particle modeling, enabling plausible simulation of raindrops, snowflakes, and fog with controllable severity. 4. We demonstrate WeatherEdit’s effectiveness in generating realistic, consistent, and controllable weather effects in 3D driving scenes, showcasing its applicability to real-world scenarios.

WeatherEdit: Controllable Weather Editing with 4D Gaussian Field Contributions: 1. Based on our analysis of weather editing characteristics, we introduce WeatherEdit, a comprehensive and efficient framework for realistic and controllable weather generation. Compared with existing methods that focus on either background editing or static weather effects, a progressive 2D-to-4D transformation process in WeatherEdit enhances adaptability across a wider range of scenarios. 2. We introduce an all-in-one adapter to enable a diffusion model for multi-weather (snowy, rainy, and fog) synthesis, along with a Temporal-View attention to ensure consistent editing across multi-frame and multi-view. 3. We design a 4D Gaussian field for weather particle modeling, enabling plausible simulation of raindrops, snowflakes, and fog with controllable severity. 4. We demonstrate WeatherEdit’s effectiveness in generating realistic, consistent, and controllable weather effects in 3D driving scenes, showcasing its applicability to real-world scenarios.

MrNeRF

10,607 görüntüleme • 1 yıl önce

SplatVoxel: History-Aware Novel View Streaming without Temporal Training Contributions: • We propose a hybrid Splat-Voxel feed-forward reconstruction framework that leverages historical information to enable novel view streaming, without relying on multi-view video datasets for training. • We develop an efficient sparse voxel transformer with a coarse-to-fine voxel representation, outperforming existing feed-forward Gaussian splatting methods. • Experiment results demonstrate that our proposed framework enhances novel view synthesis for streaming scene reconstruction, providing better visual quality and reduced temporal artifacts through history-aware modeling.

SplatVoxel: History-Aware Novel View Streaming without Temporal Training Contributions: • We propose a hybrid Splat-Voxel feed-forward reconstruction framework that leverages historical information to enable novel view streaming, without relying on multi-view video datasets for training. • We develop an efficient sparse voxel transformer with a coarse-to-fine voxel representation, outperforming existing feed-forward Gaussian splatting methods. • Experiment results demonstrate that our proposed framework enhances novel view synthesis for streaming scene reconstruction, providing better visual quality and reduced temporal artifacts through history-aware modeling.

MrNeRF

10,823 görüntüleme • 1 yıl önce

Is Google taking initial steps to enhance Street View? For some reason, Street View seems stuck in technology that feels outdated. I wonder if we'll see such improvements on the product side. Also, note how much better it performs in all aspects compared to Zip-NeRF in their presented material. It offers more details and fewer artifacts. Great work! "LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering" Contributions: • We propose a novel LOD representation for 3DGS which, unlike previous methods [27, 28, 17], does not recompute the list of used Gaussians at each frame. This allows for acceleration and compaction, enabling the rendering of large-scale scenes even on mobile devices. • We design a strategy to automatically select optimal hyperparameters for splitting LODs, whereas most other methods require manual tuning of hyperparameters for each 3D scene. • To further accelerate rendering, we split the scene into chunks and pre-compute sets of active Gaussians per chunk. • Finally, we introduce a novel opacity interpolation scheme to produce visually pleasing rendering and eliminate artifacts when transitioning between chunks.

Is Google taking initial steps to enhance Street View? For some reason, Street View seems stuck in technology that feels outdated. I wonder if we'll see such improvements on the product side. Also, note how much better it performs in all aspects compared to Zip-NeRF in their presented material. It offers more details and fewer artifacts. Great work! "LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering" Contributions: • We propose a novel LOD representation for 3DGS which, unlike previous methods [27, 28, 17], does not recompute the list of used Gaussians at each frame. This allows for acceleration and compaction, enabling the rendering of large-scale scenes even on mobile devices. • We design a strategy to automatically select optimal hyperparameters for splitting LODs, whereas most other methods require manual tuning of hyperparameters for each 3D scene. • To further accelerate rendering, we split the scene into chunks and pre-compute sets of active Gaussians per chunk. • Finally, we introduce a novel opacity interpolation scheme to produce visually pleasing rendering and eliminate artifacts when transitioning between chunks.

MrNeRF

62,511 görüntüleme • 1 yıl önce

Imagine an interactive, 3D social platform…where you can create, share, and enjoy immersive experiences/art with your friends in real time. Welcome to a glimpse of in 2023, accessible across Web, VR, AR, and mobile 🏂

Imagine an interactive, 3D social platform…where you can create, share, and enjoy immersive experiences/art with your friends in real time. Welcome to a glimpse of in 2023, accessible across Web, VR, AR, and mobile 🏂

Spatial

187,522 görüntüleme • 3 yıl önce

Human Hair Reconstruction with Strand-Aligned 3D Gaussians Contributions (cited): – We propose a new 3D line lifting scheme that uses a modified 3DGS reconstruction technique to lift 2D orientation maps into a 3D field while also providing refinement of the camera parameters; – We introduce a dual representation of hair strand polylines and 3D Gaussians to achieve differentiable rasterization of hair strands and leverage photometric constraints for strand-based hair reconstruction; – Based on these components, we propose a coarse-to-fine optimization method for prior-guided hair reconstruction that leverages both latent and explicit representations of the hairstyle.

Human Hair Reconstruction with Strand-Aligned 3D Gaussians Contributions (cited): – We propose a new 3D line lifting scheme that uses a modified 3DGS reconstruction technique to lift 2D orientation maps into a 3D field while also providing refinement of the camera parameters; – We introduce a dual representation of hair strand polylines and 3D Gaussians to achieve differentiable rasterization of hair strands and leverage photometric constraints for strand-based hair reconstruction; – Based on these components, we propose a coarse-to-fine optimization method for prior-guided hair reconstruction that leverages both latent and explicit representations of the hairstyle.

MrNeRF

106,497 görüntüleme • 1 yıl önce

With Hunyuan3D World Model 1.0 now released and open-sourced, we're excited to showcase the technical highlights behind this impressive innovation: ✅360° Panoramic Generation: Creates complete, immersive “world scenes”, far beyond localized views. ✅Explorable 3D Scene Generation: Generates diverse, spatially consistent 3D worlds from text/image for truly immersive exploration. ✅Interactive/Editable: Achieves separation of foreground objects, background terrain, ground, and sky, for seamless secondary editing. ✅Exportable Mesh: Generated scenes can be exported as 3D meshes for direct import into mainstream game engines and modeling software. ✅Industry-Leading SOTA Evaluation: Surpasses state-of-the-art open-source models in generation quality. As the industry's first open-source model for physical simulation and explorable world generation, Hunyuan3D World Model 1.0 aims to foster a collaborative community ecosystem with developers and enthusiasts. ✨ Try it now: 🤗 Hugging Face:

With Hunyuan3D World Model 1.0 now released and open-sourced, we're excited to showcase the technical highlights behind this impressive innovation: ✅360° Panoramic Generation: Creates complete, immersive “world scenes”, far beyond localized views. ✅Explorable 3D Scene Generation: Generates diverse, spatially consistent 3D worlds from text/image for truly immersive exploration. ✅Interactive/Editable: Achieves separation of foreground objects, background terrain, ground, and sky, for seamless secondary editing. ✅Exportable Mesh: Generated scenes can be exported as 3D meshes for direct import into mainstream game engines and modeling software. ✅Industry-Leading SOTA Evaluation: Surpasses state-of-the-art open-source models in generation quality. As the industry's first open-source model for physical simulation and explorable world generation, Hunyuan3D World Model 1.0 aims to foster a collaborative community ecosystem with developers and enthusiasts. ✨ Try it now: 🤗 Hugging Face:

Tencent Hy

23,150 görüntüleme • 11 ay önce

📢 Our lab has been exploring 3D world models for years — and we’re thrilled to share **PhysTwin**: a milestone that reconstructs object appearance, geometry, and dynamics from just a few seconds of interaction! Led by the amazing Hanxiao Jiang 👉 PhysTwin combines **Gaussian splatting** with **inverse dynamics optimization** based on simple **spring-mass** systems. ⚙️ The result? Real-time, action-conditioned 3D video prediction under novel interactions (i.e., 3D world models). 🔑 A few key takeaways: 1. Having the right structure (e.g., particles/masses) helps navigate the trade-off between sample efficiency, generalization, and broad applicability. 2. Visual foundation models (VFMs) have matured to the point where they can provide rich supervision for world modeling (e.g., tracking, shape completion). 3. Beyond VFMs, many crucial components have come together in recent years: Gaussian splats for rendering, NVIDIA Warp for high-performance simulation, and scene/asset generation from a wide range of labs and companies. The future of 3D world models is looking bright! ✨ 4. The resulting digital twin supports a wide range of downstream applications—especially in data generation and policy evaluation, thanks to its realistic rendering and simulation capabilities. 🎥 All code and data to reproduce the results, along with interactive demos, are available on the website. Check the following visualizations of: (1) observations, (2) reconstructed state/actions, (3) interactive digital twins, and (4) the overlays between real-world robot teleoperation and our model’s open-loop predictions.

📢 Our lab has been exploring 3D world models for years — and we’re thrilled to share PhysTwin: a milestone that reconstructs object appearance, geometry, and dynamics from just a few seconds of interaction! Led by the amazing Hanxiao Jiang 👉 PhysTwin combines Gaussian splatting with inverse dynamics optimization based on simple spring-mass systems. ⚙️ The result? Real-time, action-conditioned 3D video prediction under novel interactions (i.e., 3D world models). 🔑 A few key takeaways: 1. Having the right structure (e.g., particles/masses) helps navigate the trade-off between sample efficiency, generalization, and broad applicability. 2. Visual foundation models (VFMs) have matured to the point where they can provide rich supervision for world modeling (e.g., tracking, shape completion). 3. Beyond VFMs, many crucial components have come together in recent years: Gaussian splats for rendering, NVIDIA Warp for high-performance simulation, and scene/asset generation from a wide range of labs and companies. The future of 3D world models is looking bright! ✨ 4. The resulting digital twin supports a wide range of downstream applications—especially in data generation and policy evaluation, thanks to its realistic rendering and simulation capabilities. 🎥 All code and data to reproduce the results, along with interactive demos, are available on the website. Check the following visualizations of: (1) observations, (2) reconstructed state/actions, (3) interactive digital twins, and (4) the overlays between real-world robot teleoperation and our model’s open-loop predictions.

Yunzhu Li

25,279 görüntüleme • 1 yıl önce

Tripo P1.0 is live. Introducing Smart Mesh in Tripo Studio — generate structured 3D meshes in ~2 seconds. ⚡ ~2s generation 🧠 Smart topology 🛠 Production-ready assets 🌍 Built for game pipelines, real-time rendering, web 3D tools, and scalable content production From prompt to usable mesh — almost instantly. Try it: #3DAI #3DCG #3dart #Tripo

Tripo P1.0 is live. Introducing Smart Mesh in Tripo Studio — generate structured 3D meshes in ~2 seconds. ⚡ ~2s generation 🧠 Smart topology 🛠 Production-ready assets 🌍 Built for game pipelines, real-time rendering, web 3D tools, and scalable content production From prompt to usable mesh — almost instantly. Try it: #3DAI #3DCG #3dart #Tripo

Tripo

44,373 görüntüleme • 3 ay önce

Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction Contributions: • A method for efficiently reconstructing dynamic surfaces from multi-view videos using Gaussian surfels. • A unified and gradient-aware densification strategy for optimizing dynamic 3D Gaussians with fine details. • A temporal consistency approach that ensures stable and coherent surface reconstructions across frames by enforcing consistency on curvature maps. • Extensive experiments that demonstrate our method’s advantages including fast training, high-fidelity novel view synthesis, and accurate surface geometry.

Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction Contributions: • A method for efficiently reconstructing dynamic surfaces from multi-view videos using Gaussian surfels. • A unified and gradient-aware densification strategy for optimizing dynamic 3D Gaussians with fine details. • A temporal consistency approach that ensures stable and coherent surface reconstructions across frames by enforcing consistency on curvature maps. • Extensive experiments that demonstrate our method’s advantages including fast training, high-fidelity novel view synthesis, and accurate surface geometry.

MrNeRF

31,821 görüntüleme • 1 yıl önce

Introducing Kaleido💮 from AI at Meta — a universal generative neural rendering engine for photorealistic, unified object and scene view synthesis. Kaleido is built on a simple but powerful design philosophy: 3D perception is a form of visual common sense. Following this idea, we formulate rendering purely as a sequence-to-sequence generation problem, successfully unifying neural rendering with the architecture principles behind modern language and video models. Unlike traditional neural rendering methods, Kaleido learns 3D purely in a data-driven way, without explicit 3D representations or structures. It acquires spatial understanding directly through large-scale video pretraining, then multi-view 3D data finetuning, inspired by how LLMs acquire textual common sense from large corpora before specialising in domains like coding. Through extensive ablations, we progressively modernised the architecture design and training strategies and tackled key scaling challenges in sequence-to-sequence generative rendering, arriving at a design that’s simple, versatile, and scalable. Kaleido significantly outperforms prior generative models in few-view settings, and remarkably is the first zero-shot generative method matches InstantNGP-level rendering quality in multi-view settings. We view Kaleido also as an alternative step towards world modeling that flexibly spans a spectrum of “realities": with many views, it faithfully reconstructs grounded reality; with fewer views, it imagines plausible unseen details. 🔗 Explore more results and paper:

Introducing Kaleido💮 from AI at Meta — a universal generative neural rendering engine for photorealistic, unified object and scene view synthesis. Kaleido is built on a simple but powerful design philosophy: 3D perception is a form of visual common sense. Following this idea, we formulate rendering purely as a sequence-to-sequence generation problem, successfully unifying neural rendering with the architecture principles behind modern language and video models. Unlike traditional neural rendering methods, Kaleido learns 3D purely in a data-driven way, without explicit 3D representations or structures. It acquires spatial understanding directly through large-scale video pretraining, then multi-view 3D data finetuning, inspired by how LLMs acquire textual common sense from large corpora before specialising in domains like coding. Through extensive ablations, we progressively modernised the architecture design and training strategies and tackled key scaling challenges in sequence-to-sequence generative rendering, arriving at a design that’s simple, versatile, and scalable. Kaleido significantly outperforms prior generative models in few-view settings, and remarkably is the first zero-shot generative method matches InstantNGP-level rendering quality in multi-view settings. We view Kaleido also as an alternative step towards world modeling that flexibly spans a spectrum of “realities": with many views, it faithfully reconstructs grounded reality; with fewer views, it imagines plausible unseen details. 🔗 Explore more results and paper:

Shikun Liu

22,134 görüntüleme • 8 ay önce

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models Contributions: • We introduce Diffuman4D, a novel diffusion model that generates spatio-temporally consistent and high-resolution (1024p) human videos from sparse-view video inputs. • We propose a sliding iterative denoising mechanism that enhances both the spatial and temporal consistency of generated long-term videos while maintaining efficient inference. • We design a human pose conditioning scheme to enhance the appearance quality and motion accuracy of generated human videos. • We plan to release our processed version of the DNA-Rendering dataset, which we believe will benefit future research in this area.

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models Contributions: • We introduce Diffuman4D, a novel diffusion model that generates spatio-temporally consistent and high-resolution (1024p) human videos from sparse-view video inputs. • We propose a sliding iterative denoising mechanism that enhances both the spatial and temporal consistency of generated long-term videos while maintaining efficient inference. • We design a human pose conditioning scheme to enhance the appearance quality and motion accuracy of generated human videos. • We plan to release our processed version of the DNA-Rendering dataset, which we believe will benefit future research in this area.

MrNeRF

24,729 görüntüleme • 11 ay önce

Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians paper page: Creating high-fidelity 3D head avatars has always been a research hotspot, but there remains a great challenge under lightweight sparse view setups. In this paper, we propose Gaussian Head Avatar represented by controllable 3D Gaussians for high-fidelity head avatar modeling. We optimize the neutral 3D Gaussians and a fully learned MLP-based deformation field to capture complex expressions. The two parts benefit each other, thereby our method can model fine-grained dynamic details while ensuring expression accuracy. Furthermore, we devise a well-designed geometry-guided initialization strategy based on implicit SDF and Deep Marching Tetrahedra for the stability and convergence of the training procedure. Experiments show our approach outperforms other state-of-the-art sparse-view methods, achieving ultra high-fidelity rendering quality at 2K resolution even under exaggerated expressions.

AK

65,834 görüntüleme • 2 yıl önce