Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

📢 LinPrim: Linear Primitives for Differentiable Volumetric Rendering 📢 We use octahedra or tetrahedra as explicit as volumetric building blocks for gradient-based novel view synthesis - as an alternative to 3D Gaussians with discrete, bounded geometry. We show how it can be used to reconstruct photorealistic scenes, and introduce... show more

Matthias Niessner

48,009 subscribers

11,413 просмотров • 1 год назад •via X (Twitter)

Искусство Наука и технологии Образование

Anya Rossi• Live Now

Private livecam show

Комментарии: 2

Фото профиля 69420

694201 год назад

code?

Фото профиля Rainmaker

Rainmaker2 лет назад

In this free Substack post I share code for several machine learning models and engage in hyperparameter tuning that yields a model that delivers superior returns in the Gold market.

Похожие видео

Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives paper page: Given a set of calibrated images of a scene, we present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. While many approaches focus on recovering high-fidelity 3D scenes, we focus on parsing a scene into mid-level 3D representations made of a small set of textured primitives. Such representations are interpretable, easy to manipulate and suited for physics-based simulations. Moreover, unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images through differentiable rendering. Specifically, we model primitives as textured superquadric meshes and optimize their parameters from scratch with an image rendering loss. We highlight the importance of modeling transparency for each primitive, which is critical for optimization and also enables handling varying numbers of primitives. We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points, while providing amodal shape completions of unseen object regions. We compare our approach to the state of the art on diverse scenes from DTU, and demonstrate its robustness on real-life captures from BlendedMVS and Nerfstudio. We also showcase how our results can be used to effortlessly edit a scene or perform physical simulations.

Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives paper page: Given a set of calibrated images of a scene, we present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. While many approaches focus on recovering high-fidelity 3D scenes, we focus on parsing a scene into mid-level 3D representations made of a small set of textured primitives. Such representations are interpretable, easy to manipulate and suited for physics-based simulations. Moreover, unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images through differentiable rendering. Specifically, we model primitives as textured superquadric meshes and optimize their parameters from scratch with an image rendering loss. We highlight the importance of modeling transparency for each primitive, which is critical for optimization and also enables handling varying numbers of primitives. We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points, while providing amodal shape completions of unseen object regions. We compare our approach to the state of the art on diverse scenes from DTU, and demonstrate its robustness on real-life captures from BlendedMVS and Nerfstudio. We also showcase how our results can be used to effortlessly edit a scene or perform physical simulations.

AK

38,571 просмотров • 3 лет назад

📢Happy to present Convex Splatting, a novel way for 3D reconstruction based on 3D smooth convexes. For the first time, a splatting-based method reaches the quality of NeRF sota methods but with real-time rendering and few primitives!! I expect this to replace Gaussian splatting for 3D in the coming months. CODE RELEASED TODAY! joint work with collaborators from Université de Liège Visual Geometry Group (VGG) , KAUST Computer Vision Lab (IVUL) a thread 🧵 1/n

📢Happy to present Convex Splatting, a novel way for 3D reconstruction based on 3D smooth convexes. For the first time, a splatting-based method reaches the quality of NeRF sota methods but with real-time rendering and few primitives!! I expect this to replace Gaussian splatting for 3D in the coming months. CODE RELEASED TODAY! joint work with collaborators from Université de Liège Visual Geometry Group (VGG) , KAUST Computer Vision Lab (IVUL) a thread 🧵 1/n

Abdullah Hamdi

71,661 просмотров • 1 год назад

📢 SHeaP: Self-Supervised Head Predictor Learned via 2D Gaussians 📢 Given a single input image, we predict accurate 3D head geometry, pose, and expression. Previous works (e.g. DECA, EMOCA) use differentiable mesh rasterization to learn a self-supervised head geometry predictor via a photometric reconstruction loss. We borrow these ideas, but our key insight is to replace the mesh rendering with 2D Gaussian Splatting. This leads to much higher accuracy of the underlying predicted geometry and thus more gradient signal during training. 🌍 🎥 Great work by Liam Schoneveld Davide Davoli Jiapeng Tang

📢 SHeaP: Self-Supervised Head Predictor Learned via 2D Gaussians 📢 Given a single input image, we predict accurate 3D head geometry, pose, and expression. Previous works (e.g. DECA, EMOCA) use differentiable mesh rasterization to learn a self-supervised head geometry predictor via a photometric reconstruction loss. We borrow these ideas, but our key insight is to replace the mesh rendering with 2D Gaussian Splatting. This leads to much higher accuracy of the underlying predicted geometry and thus more gradient signal during training. 🌍 🎥 Great work by Liam Schoneveld Davide Davoli Jiapeng Tang

Matthias Niessner

28,559 просмотров • 1 год назад

GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views TL;DR: Are we witnessing the first steps towards 3DGS live streaming? Contributions: • We introduce a generalizable 3D Gaussian Splatting methodology that employs pixel-wise Gaussian parameter maps defined on 2D source image planes to formulate 3D Gaussians in a feed-forward manner. • We propose a fully differentiable framework composed of an iterative depth estimation module and a Gaussian parameter regression module. The intermediate depth prediction bridges the two components and allows them to benefit from joint training. • We introduce a regularization term and an epipolar attention mechanism to preserve geometry consistency between the two source views when using only rendering loss. Our method generalizes well to unseen characters even in complicated scenes. • We develop a real-time FVV system that achieves high-resolution rendering of characters in the scene without any geometry supervision.

GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views TL;DR: Are we witnessing the first steps towards 3DGS live streaming? Contributions: • We introduce a generalizable 3D Gaussian Splatting methodology that employs pixel-wise Gaussian parameter maps defined on 2D source image planes to formulate 3D Gaussians in a feed-forward manner. • We propose a fully differentiable framework composed of an iterative depth estimation module and a Gaussian parameter regression module. The intermediate depth prediction bridges the two components and allows them to benefit from joint training. • We introduce a regularization term and an epipolar attention mechanism to preserve geometry consistency between the two source views when using only rendering loss. Our method generalizes well to unseen characters even in complicated scenes. • We develop a real-time FVV system that achieves high-resolution rendering of characters in the scene without any geometry supervision.

MrNeRF

25,862 просмотров • 1 год назад

📢📢GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction📢📢 Reconstructing high-fidelity 3D scenes from sparse RGB input is hard. It needs a strong 3D prior! We reformulate multi-view scene reconstruction as conditional 3D generation over overlapping spatial chunks, lifting posed image features into a generative shape prior via 3D conditioning. As an example prior, we build on Trellis2, and train it such that its reconstruction is pixel aligned and matches from all views. GenRecon achieves unprecedented reconstruction quality from any sparse RGB input sequence, even from a phone capture. The reconstruction also includes PBR materials which facilitates relighting and virtual object insertion. Amazing work by Katharina Schmid, Nicolas von Lützow, Jozef, Angela Dai

📢📢GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction📢📢 Reconstructing high-fidelity 3D scenes from sparse RGB input is hard. It needs a strong 3D prior! We reformulate multi-view scene reconstruction as conditional 3D generation over overlapping spatial chunks, lifting posed image features into a generative shape prior via 3D conditioning. As an example prior, we build on Trellis2, and train it such that its reconstruction is pixel aligned and matches from all views. GenRecon achieves unprecedented reconstruction quality from any sparse RGB input sequence, even from a phone capture. The reconstruction also includes PBR materials which facilitates relighting and virtual object insertion. Amazing work by Katharina Schmid, Nicolas von Lützow, Jozef, Angela Dai

Matthias Niessner

18,098 просмотров • 2 месяцев назад

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian Contributions: • We propose a novel scene representation for accurately modeling complex near-field and high-frequency reflections in real-world environments. • We developed a real-time ray-tracing renderer for 2DGS, enabling joint optimization of our representation for accurate scene reconstruction while achieving real-time rendering speeds. • Extensive experiments show that EnvGS significantly outperforms previous methods. To the best of our knowledge, EnvGS is the first method to achieve real-time photorealistic specular reflections synthesis in real-world scenes.

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian Contributions: • We propose a novel scene representation for accurately modeling complex near-field and high-frequency reflections in real-world environments. • We developed a real-time ray-tracing renderer for 2DGS, enabling joint optimization of our representation for accurate scene reconstruction while achieving real-time rendering speeds. • Extensive experiments show that EnvGS significantly outperforms previous methods. To the best of our knowledge, EnvGS is the first method to achieve real-time photorealistic specular reflections synthesis in real-world scenes.

MrNeRF

44,650 просмотров • 1 год назад

Large-scale 3D Scene Generation (all scenes are real-time rendered)!! Physically-grounded generative data without hallucinations is the missing link for robot learning and testing at scale. We introduce a method that directly generates large-scale 3D driving scenes with accurate geometry, allowing for causal view synthesis and generation with object permanence and explicit 3D geometry. This also allows for extreme trajectory extrapolation without failure! We also show that we can build fully data-driven simulators for end-to-end learning with this approach. Project: with the amazing team of Julian Ost, Amogh Joshi , Andrea Ramazzina, Maximilian Bömer, Mario Bijelic.

Large-scale 3D Scene Generation (all scenes are real-time rendered)!! Physically-grounded generative data without hallucinations is the missing link for robot learning and testing at scale. We introduce a method that directly generates large-scale 3D driving scenes with accurate geometry, allowing for causal view synthesis and generation with object permanence and explicit 3D geometry. This also allows for extreme trajectory extrapolation without failure! We also show that we can build fully data-driven simulators for end-to-end learning with this approach. Project: with the amazing team of Julian Ost, Amogh Joshi , Andrea Ramazzina, Maximilian Bömer, Mario Bijelic.

Felix Heide

27,779 просмотров • 10 месяцев назад

🍺 LagerNVS (CVPR 2026) 🍺 LagerNVS is a generalizable, feed-forward, real-time Novel View Synthesis network which - performs rendering in real time, - generalizes to in-the-wild data, - works with and without known source cameras, - sets a new state-of-the-art among deterministic methods, - can be paired with a diffusion decoder for generative extrapolation. LagerNVS shows that 3D biases are useful for Novel View Synthesis but explicit 3D representations are not required to achieve them. We use 3D biases in (1) architecture design and (2) pre-training: (1) In NVS with explicit 3D representations (3DGS, NeRF) reconstruction is typically difficult and slow, but rendering is much faster and simpler. We mimic this process in the network design: we use a large (1B params) encoder and a small, lightweight decoder (ViT-B). This allows increasing the network capacity while still achieving real-time rendering. (2) The encoder, initialized from VGGT, was pre-trained with 3D reconstruction objectives, making the initial features 3D aware. Both substantially improve performance. Project page: Code: Paper: Models: Work done with Jianyuan Minghao Chen Christian Rupprecht and Andrea Vedaldi

🍺 LagerNVS (CVPR 2026) 🍺 LagerNVS is a generalizable, feed-forward, real-time Novel View Synthesis network which - performs rendering in real time, - generalizes to in-the-wild data, - works with and without known source cameras, - sets a new state-of-the-art among deterministic methods, - can be paired with a diffusion decoder for generative extrapolation. LagerNVS shows that 3D biases are useful for Novel View Synthesis but explicit 3D representations are not required to achieve them. We use 3D biases in (1) architecture design and (2) pre-training: (1) In NVS with explicit 3D representations (3DGS, NeRF) reconstruction is typically difficult and slow, but rendering is much faster and simpler. We mimic this process in the network design: we use a large (1B params) encoder and a small, lightweight decoder (ViT-B). This allows increasing the network capacity while still achieving real-time rendering. (2) The encoder, initialized from VGGT, was pre-trained with 3D reconstruction objectives, making the initial features 3D aware. Both substantially improve performance. Project page: Code: Paper: Models: Work done with Jianyuan Minghao Chen Christian Rupprecht and Andrea Vedaldi

Stan Szymanowicz

31,651 просмотров • 4 месяцев назад

[SIGGRAPH Asia '24 (TOG)] Representing Long Volumetric Video with Temporal Gaussian Hierarchy Contributions: • We introduce a novel, efficient, and expressive Temporal Gaussian Hierarchy representation for long volumetric video. To our knowledge, our method is the first approach capable of handling minutes of volumetric video data. • We propose a Compact Appearance Model and a new rasterization implementation to facilitate real-time, high-quality dynamic view synthesis while maintaining a compact size. • We propose a system to efficiently model long volumetric videos for the first time and demonstrate state-of-the-art dynamic view synthesis quality on the Neural3DV [Li et al. 2022], ENeRF-Outdoor [Lin et al. 2022], and MobileStage [Xu et al. 2024b] datasets, while also achieving the best rendering speed with reduced training cost and memory usage.

[SIGGRAPH Asia '24 (TOG)] Representing Long Volumetric Video with Temporal Gaussian Hierarchy Contributions: • We introduce a novel, efficient, and expressive Temporal Gaussian Hierarchy representation for long volumetric video. To our knowledge, our method is the first approach capable of handling minutes of volumetric video data. • We propose a Compact Appearance Model and a new rasterization implementation to facilitate real-time, high-quality dynamic view synthesis while maintaining a compact size. • We propose a system to efficiently model long volumetric videos for the first time and demonstrate state-of-the-art dynamic view synthesis quality on the Neural3DV [Li et al. 2022], ENeRF-Outdoor [Lin et al. 2022], and MobileStage [Xu et al. 2024b] datasets, while also achieving the best rendering speed with reduced training cost and memory usage.

MrNeRF

79,379 просмотров • 1 год назад

VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams paper page: Neural Radiance Fields (NeRFs) excel in photorealistically rendering static scenes. However, rendering dynamic, long-duration radiance fields on ubiquitous devices remains challenging, due to data storage and computational constraints. In this paper, we introduce VideoRF, the first approach to enable real-time streaming and rendering of dynamic radiance fields on mobile platforms. At the core is a serialized 2D feature image stream representing the 4D radiance field all in one. We introduce a tailored training scheme directly applied to this 2D domain to impose the temporal and spatial redundancy of the feature image stream. By leveraging the redundancy, we show that the feature image stream can be efficiently compressed by 2D video codecs, which allows us to exploit video hardware accelerators to achieve real-time decoding. On the other hand, based on the feature image stream, we propose a novel rendering pipeline for VideoRF, which has specialized space mappings to query radiance properties efficiently. Paired with a deferred shading model, VideoRF has the capability of real-time rendering on mobile devices thanks to its efficiency. We have developed a real-time interactive player that enables online streaming and rendering of dynamic scenes, offering a seamless and immersive free-viewpoint experience across a range of devices, from desktops to mobile phones.

VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams paper page: Neural Radiance Fields (NeRFs) excel in photorealistically rendering static scenes. However, rendering dynamic, long-duration radiance fields on ubiquitous devices remains challenging, due to data storage and computational constraints. In this paper, we introduce VideoRF, the first approach to enable real-time streaming and rendering of dynamic radiance fields on mobile platforms. At the core is a serialized 2D feature image stream representing the 4D radiance field all in one. We introduce a tailored training scheme directly applied to this 2D domain to impose the temporal and spatial redundancy of the feature image stream. By leveraging the redundancy, we show that the feature image stream can be efficiently compressed by 2D video codecs, which allows us to exploit video hardware accelerators to achieve real-time decoding. On the other hand, based on the feature image stream, we propose a novel rendering pipeline for VideoRF, which has specialized space mappings to query radiance properties efficiently. Paired with a deferred shading model, VideoRF has the capability of real-time rendering on mobile devices thanks to its efficiency. We have developed a real-time interactive player that enables online streaming and rendering of dynamic scenes, offering a seamless and immersive free-viewpoint experience across a range of devices, from desktops to mobile phones.

AK

38,686 просмотров • 2 лет назад

Fast View Synthesis of Casual Videos paper page: Novel view synthesis from an in-the-wild video is difficult due to challenges like scene dynamics and lack of parallax. While existing methods have shown promising results with implicit neural radiance fields, they are slow to train and render. This paper revisits explicit video representations to synthesize high-quality novel views from a monocular video efficiently. We treat static and dynamic video content separately. Specifically, we build a global static scene model using an extended plane-based scene representation to synthesize temporally coherent novel video. Our plane-based scene representation is augmented with spherical harmonics and displacement maps to capture view-dependent effects and model non-planar complex surface geometry. We opt to represent the dynamic content as per-frame point clouds for efficiency. While such representations are inconsistency-prone, minor temporal inconsistencies are perceptually masked due to motion. We develop a method to quickly estimate such a hybrid video representation and render novel views in real time. Our experiments show that our method can render high-quality novel views from an in-the-wild video with comparable quality to state-of-the-art methods while being 100x faster in training and enabling real-time rendering.

Fast View Synthesis of Casual Videos paper page: Novel view synthesis from an in-the-wild video is difficult due to challenges like scene dynamics and lack of parallax. While existing methods have shown promising results with implicit neural radiance fields, they are slow to train and render. This paper revisits explicit video representations to synthesize high-quality novel views from a monocular video efficiently. We treat static and dynamic video content separately. Specifically, we build a global static scene model using an extended plane-based scene representation to synthesize temporally coherent novel video. Our plane-based scene representation is augmented with spherical harmonics and displacement maps to capture view-dependent effects and model non-planar complex surface geometry. We opt to represent the dynamic content as per-frame point clouds for efficiency. While such representations are inconsistency-prone, minor temporal inconsistencies are perceptually masked due to motion. We develop a method to quickly estimate such a hybrid video representation and render novel views in real time. Our experiments show that our method can render high-quality novel views from an in-the-wild video with comparable quality to state-of-the-art methods while being 100x faster in training and enabling real-time rendering.

AK

20,668 просмотров • 2 лет назад

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors paper page: present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images.

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors paper page: present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images.

AK

305,663 просмотров • 3 лет назад

$FAU Erlangen-Nürnberg presents TRIPS Trilinear Point Splatting for Real-Time Radiance Field Rendering paper page: Point-based radiance field rendering has demonstrated impressive results for novel view synthesis, offering a compelling blend of rendering quality and computational efficiency. However, also latest approaches in this domain are not without their shortcomings. 3D Gaussian Splatting [Kerbl and Kopanas et al. 2023] struggles when tasked with rendering highly detailed scenes, due to blurring and cloudy artifacts. On the other hand, ADOP [R\"uckert et al. 2022] can accommodate crisper images, but the neural reconstruction network decreases performance, it grapples with temporal instability and it is unable to effectively address large gaps in the point cloud. In this paper, we present TRIPS (Trilinear Point Splatting), an approach that combines ideas from both Gaussian Splatting and ADOP. The fundamental concept behind our novel technique involves rasterizing points into a screen-space image pyramid, with the selection of the pyramid layer determined by the projected point size. This approach allows rendering arbitrarily large points using a single trilinear write. A lightweight neural network is then used to reconstruct a hole-free image including detail beyond splat resolution. Importantly, our render pipeline is entirely differentiable, allowing for automatic optimization of both point sizes and positions. Our evaluation demonstrate that TRIPS surpasses existing state-of-the-art methods in terms of rendering quality while maintaining a real-time frame rate of 60 frames per second on readily available hardware. This performance extends to challenging scenarios, such as scenes featuring intricate geometry, expansive landscapes, and auto-exposed footage.$

FAU Erlangen-Nürnberg presents TRIPS Trilinear Point Splatting for Real-Time Radiance Field Rendering paper page: Point-based radiance field rendering has demonstrated impressive results for novel view synthesis, offering a compelling blend of rendering quality and computational efficiency. However, also latest approaches in this domain are not without their shortcomings. 3D Gaussian Splatting [Kerbl and Kopanas et al. 2023] struggles when tasked with rendering highly detailed scenes, due to blurring and cloudy artifacts. On the other hand, ADOP [R\"uckert et al. 2022] can accommodate crisper images, but the neural reconstruction network decreases performance, it grapples with temporal instability and it is unable to effectively address large gaps in the point cloud. In this paper, we present TRIPS (Trilinear Point Splatting), an approach that combines ideas from both Gaussian Splatting and ADOP. The fundamental concept behind our novel technique involves rasterizing points into a screen-space image pyramid, with the selection of the pyramid layer determined by the projected point size. This approach allows rendering arbitrarily large points using a single trilinear write. A lightweight neural network is then used to reconstruct a hole-free image including detail beyond splat resolution. Importantly, our render pipeline is entirely differentiable, allowing for automatic optimization of both point sizes and positions. Our evaluation demonstrate that TRIPS surpasses existing state-of-the-art methods in terms of rendering quality while maintaining a real-time frame rate of 60 frames per second on readily available hardware. This performance extends to challenging scenarios, such as scenes featuring intricate geometry, expansive landscapes, and auto-exposed footage.

AK

45,459 просмотров • 2 лет назад

SharedNeRF: Leveraging Photorealistic and View-dependent Rendering for Real-time and Remote Collaboration Abstract (excerpt): When collaborators are remote, coordinating the sharing of views of their physical environment becomes challenging. Video-conferencing tools often do not provide the desired viewpoints for a remote viewer. While RGB-D cameras offer 3D views, they lack the necessary fidelity. We introduce SharedNeRF, designed to enhance synchronous remote collaboration by leveraging the photorealistic and view-dependent nature of Neural Radiance Field (NeRF). The system complements the higher visual quality of the NeRF rendering with the instantaneity of a point cloud and combines them through carefully accommodating the dynamic elements within the shared space, such as hand gestures and moving objects. The system employs a head-mounted camera for data collection, creating a volumetric task space on the fly and updating it as the task space changes.

SharedNeRF: Leveraging Photorealistic and View-dependent Rendering for Real-time and Remote Collaboration Abstract (excerpt): When collaborators are remote, coordinating the sharing of views of their physical environment becomes challenging. Video-conferencing tools often do not provide the desired viewpoints for a remote viewer. While RGB-D cameras offer 3D views, they lack the necessary fidelity. We introduce SharedNeRF, designed to enhance synchronous remote collaboration by leveraging the photorealistic and view-dependent nature of Neural Radiance Field (NeRF). The system complements the higher visual quality of the NeRF rendering with the instantaneity of a point cloud and combines them through carefully accommodating the dynamic elements within the shared space, such as hand gestures and moving objects. The system employs a head-mounted camera for data collection, creating a volumetric task space on the fly and updating it as the task space changes.

MrNeRF

10,615 просмотров • 1 год назад

Nvidia announces GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning paper page: Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (e.g., flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (e.g., colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution.

AK

140,992 просмотров • 2 лет назад

GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering Contributions: • We reformulate video stabilization as a novel 3D grounded scheme of local reconstruction and rendering. This approach is naturally robust to diverse camera motions and scene dynamics, is temporally consistent, and is capable of full frame stabilization. • We propose a novel test-time optimization for each unstable video. It leverages multi-view dynamics-aware photometric supervision and cross-frame regularization to achieve temporally consistent reconstructions. To avoid frame cropping, we introduce a scene extrapolation module based on video completion. • We provide a 3D-grounded dataset for our task by re-purposing an existing one, and introduce new metrics on sparse and dense reconstruction to evaluate 3D scene consistency. Extensive experiments (quantitative, qualitative, user study) versus image-based and gyro-basedmethods demonstrate the merits of our method.

GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering Contributions: • We reformulate video stabilization as a novel 3D grounded scheme of local reconstruction and rendering. This approach is naturally robust to diverse camera motions and scene dynamics, is temporally consistent, and is capable of full frame stabilization. • We propose a novel test-time optimization for each unstable video. It leverages multi-view dynamics-aware photometric supervision and cross-frame regularization to achieve temporally consistent reconstructions. To avoid frame cropping, we introduce a scene extrapolation module based on video completion. • We provide a 3D-grounded dataset for our task by re-purposing an existing one, and introduce new metrics on sparse and dense reconstruction to evaluate 3D scene consistency. Extensive experiments (quantitative, qualitative, user study) versus image-based and gyro-basedmethods demonstrate the merits of our method.

MrNeRF

11,638 просмотров • 1 год назад

Drivable 3D Gaussian Avatars paper page: present Drivable 3D Gaussian Avatars (D3GA), the first 3D controllable model for human bodies rendered with Gaussian splats. Current photorealistic drivable avatars require either accurate 3D registrations during training, dense input images during testing, or both. The ones based on neural radiance fields also tend to be prohibitively slow for telepresence applications. This work uses the recently presented 3D Gaussian Splatting (3DGS) technique to render realistic humans at real-time framerates, using dense calibrated multi-view videos as input. To deform those primitives, we depart from the commonly used point deformation method of linear blend skinning (LBS) and use a classic volumetric deformation method: cage deformations. Given their smaller size, we drive these deformations with joint angles and keypoints, which are more suitable for communication applications. Our experiments on nine subjects with varied body shapes, clothes, and motions obtain higher-quality results than state-of-the-art methods when using the same training and test data.

Drivable 3D Gaussian Avatars paper page: present Drivable 3D Gaussian Avatars (D3GA), the first 3D controllable model for human bodies rendered with Gaussian splats. Current photorealistic drivable avatars require either accurate 3D registrations during training, dense input images during testing, or both. The ones based on neural radiance fields also tend to be prohibitively slow for telepresence applications. This work uses the recently presented 3D Gaussian Splatting (3DGS) technique to render realistic humans at real-time framerates, using dense calibrated multi-view videos as input. To deform those primitives, we depart from the commonly used point deformation method of linear blend skinning (LBS) and use a classic volumetric deformation method: cage deformations. Given their smaller size, we drive these deformations with joint angles and keypoints, which are more suitable for communication applications. Our experiments on nine subjects with varied body shapes, clothes, and motions obtain higher-quality results than state-of-the-art methods when using the same training and test data.

AK

327,105 просмотров • 2 лет назад

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

AK

62,768 просмотров • 3 лет назад

[SIGGRAPH '25] TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling Note: On the left that's a 3DGS rendering! Contributions: 1. We propose a simple approach for rigging 3D Gaussians within the continuous tangent space of 3DMM face models, allowing Gaussians to move freely across mesh triangles. 2. We propose a novel CNN-based deformation model that is agnostic to the number of 3D Gaussians, naturally enabling adaptively densification of the representation to improve detail where most needed, with expression-dependent shading. 3. We show significant improvements over baseline SOTA methods and demonstrate the ability to render even extreme close-up images at high quality.

[SIGGRAPH '25] TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling Note: On the left that's a 3DGS rendering! Contributions: 1. We propose a simple approach for rigging 3D Gaussians within the continuous tangent space of 3DMM face models, allowing Gaussians to move freely across mesh triangles. 2. We propose a novel CNN-based deformation model that is agnostic to the number of 3D Gaussians, naturally enabling adaptively densification of the representation to improve detail where most needed, with expression-dependent shading. 3. We show significant improvements over baseline SOTA methods and demonstrate the ability to render even extreme close-up images at high quality.

MrNeRF

29,010 просмотров • 1 год назад