正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

FastMap: Revisiting Dense and Scalable Structure from Motion "FASTMAP, a redesigned SfM framework, achieves fast, high-accuracy dense structure from motion. On large scenes with thousands of images, FASTMAP is up to one to two orders of magnitude faster than GLOMAP and COLMAP. ... Importantly, FASTMAP achieves efficiency improvements while... keeping comparable performance. Extensive experiments on eight datasets demonstrate pose estimation accuracy and novel view synthesis quality close to GLOMAP and COLMAP. " Contributions: 1. For all the iterative nonlinear optimization problems involved, we design algorithms such that the computational complexity of each iteration is only linear in the number of image pairs, not keypoint pairs or 3D points. This includes replacing the traditional bundle adjustment [50] present in previous SfM frameworks with a novel re-weighting epipolar adjustment algorithm, which is much more efficient. 2. Throughout the entire framework, we formulate as many steps as possible as GPU-friendly dense tensor operations. This allows us to implement the entire method in PyTorch [39], which provides seamless GPU acceleration.show more

MrNeRF

13,566 subscribers

15,233 次观看 • 1 年前 •via X (Twitter)

艺术科学技术教育

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Meta releases VGGSfM Visual Geometry Grounded Deep Structure From Motion Structure-from-motion (SfM) is a long-standing problem in the computer vision community, which aims to reconstruct the camera poses and 3D structure of a scene from a set of unconstrained 2D images. Classical frameworks solve this problem in an incremental manner by detecting and matching keypoints, registering images, triangulating 3D points, and conducting bundle adjustment. Recent research efforts have predominantly revolved around harnessing the power of deep learning techniques to enhance specific elements (e.g., keypoint matching), but are still based on the original, non-differentiable pipeline. Instead, we propose a new deep SfM pipeline VGGSfM, where each component is fully differentiable and thus can be trained in an end-to-end manner. To this end, we introduce new mechanisms and simplifications. First, we build on recent advances in deep 2D point tracking to extract reliable pixel-accurate tracks, which eliminates the need for chaining pairwise matches. Furthermore, we recover all cameras simultaneously based on the image and track features instead of gradually registering cameras. Finally, we optimise the cameras and triangulate 3D points via a differentiable bundle adjustment layer. We attain state-of-the-art performance on three popular datasets, CO3D, IMC Phototourism, and ETH3D.

Meta releases VGGSfM Visual Geometry Grounded Deep Structure From Motion Structure-from-motion (SfM) is a long-standing problem in the computer vision community, which aims to reconstruct the camera poses and 3D structure of a scene from a set of unconstrained 2D images. Classical frameworks solve this problem in an incremental manner by detecting and matching keypoints, registering images, triangulating 3D points, and conducting bundle adjustment. Recent research efforts have predominantly revolved around harnessing the power of deep learning techniques to enhance specific elements (e.g., keypoint matching), but are still based on the original, non-differentiable pipeline. Instead, we propose a new deep SfM pipeline VGGSfM, where each component is fully differentiable and thus can be trained in an end-to-end manner. To this end, we introduce new mechanisms and simplifications. First, we build on recent advances in deep 2D point tracking to extract reliable pixel-accurate tracks, which eliminates the need for chaining pairwise matches. Furthermore, we recover all cameras simultaneously based on the image and track features instead of gradually registering cameras. Finally, we optimise the cameras and triangulate 3D points via a differentiable bundle adjustment layer. We attain state-of-the-art performance on three popular datasets, CO3D, IMC Phototourism, and ETH3D.

AK

96,527 次观看 • 1 年前

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

MrNeRF

52,801 次观看 • 1 年前

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models Contributions: • We introduce Diffuman4D, a novel diffusion model that generates spatio-temporally consistent and high-resolution (1024p) human videos from sparse-view video inputs. • We propose a sliding iterative denoising mechanism that enhances both the spatial and temporal consistency of generated long-term videos while maintaining efficient inference. • We design a human pose conditioning scheme to enhance the appearance quality and motion accuracy of generated human videos. • We plan to release our processed version of the DNA-Rendering dataset, which we believe will benefit future research in this area.

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models Contributions: • We introduce Diffuman4D, a novel diffusion model that generates spatio-temporally consistent and high-resolution (1024p) human videos from sparse-view video inputs. • We propose a sliding iterative denoising mechanism that enhances both the spatial and temporal consistency of generated long-term videos while maintaining efficient inference. • We design a human pose conditioning scheme to enhance the appearance quality and motion accuracy of generated human videos. • We plan to release our processed version of the DNA-Rendering dataset, which we believe will benefit future research in this area.

MrNeRF

24,729 次观看 • 11 个月前

Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction Note: Check below for full video. Abstract (cited): "In this paper, we present a self-calibrating framework that jointly optimizes camera parameters, lens distortion, and 3D Gaussian representations, enabling accurate and efficient scene reconstruction. Our technique is particularly effective for high-quality scene reconstruction from large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of images. We introduce a novel method for modeling complex lens distortions using a hybrid network that combines invertible residual networks with explicit grids. This design effectively regularizes the optimization process, achieving greater accuracy than conventional camera models. Additionally, we propose a cubemap-based resampling strategy to support large FOV images without sacrificing resolution or introducing distortion artifacts. Our method is compatible with the fast rasterization of Gaussian Splatting, adaptable to a wide variety of camera lens distortions, and demonstrates state-of-the-art performance on both synthetic and real-world datasets."

Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction Note: Check below for full video. Abstract (cited): "In this paper, we present a self-calibrating framework that jointly optimizes camera parameters, lens distortion, and 3D Gaussian representations, enabling accurate and efficient scene reconstruction. Our technique is particularly effective for high-quality scene reconstruction from large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of images. We introduce a novel method for modeling complex lens distortions using a hybrid network that combines invertible residual networks with explicit grids. This design effectively regularizes the optimization process, achieving greater accuracy than conventional camera models. Additionally, we propose a cubemap-based resampling strategy to support large FOV images without sacrificing resolution or introducing distortion artifacts. Our method is compatible with the fast rasterization of Gaussian Splatting, adaptable to a wide variety of camera lens distortions, and demonstrates state-of-the-art performance on both synthetic and real-world datasets."

MrNeRF

17,206 次观看 • 1 年前

Code of #ACEZero is out. A new approach to SfM. Learn the 3D scene without image-to-image matching. Naturally avoids the explosion of complexity for many images. ACE0 shines if you have dense coverage of a scene. Posing 10k images and more? Sure!

Code of #ACEZero is out. A new approach to SfM. Learn the 3D scene without image-to-image matching. Naturally avoids the explosion of complexity for many images. ACE0 shines if you have dense coverage of a scene. Posing 10k images and more? Sure!

Eric Brachmann

23,988 次观看 • 1 年前

3D Gaussian Splatting for Real-Time Radiance Field Rendering paper page: Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.

3D Gaussian Splatting for Real-Time Radiance Field Rendering paper page: Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.

AK

633,313 次观看 • 2 年前

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion TL;DR: Create 3/4DGS from Video Diffusion Note: Some first inference code released (not all yet). Contributions (cited): • We present DimensionX, a novel framework for generating photorealistic 3D and 4D scenes from only a single image using controllable video diffusion. • We propose ST-Director, which decouples the spatial and temporal priors in video diffusion models by learning (spatial and temporal) dimension-aware modules with our curated datasets. We further enhance the hybriddimension control with a training-free composition approach according to the essence of video diffusion denoising process. • To bridge the gap between video diffusion and real-world scenes, we design a trajectory-aware mechanism for 3D generation and an identity-preserving denoising approach for 4D generation, enabling more realistic and controllable scene synthesis. • Extensive experiments manifest that our DimensionX delivers superior performance in video, 3D, and 4D generation compared with baseline methods.

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion TL;DR: Create 3/4DGS from Video Diffusion Note: Some first inference code released (not all yet). Contributions (cited): • We present DimensionX, a novel framework for generating photorealistic 3D and 4D scenes from only a single image using controllable video diffusion. • We propose ST-Director, which decouples the spatial and temporal priors in video diffusion models by learning (spatial and temporal) dimension-aware modules with our curated datasets. We further enhance the hybriddimension control with a training-free composition approach according to the essence of video diffusion denoising process. • To bridge the gap between video diffusion and real-world scenes, we design a trajectory-aware mechanism for 3D generation and an identity-preserving denoising approach for 4D generation, enabling more realistic and controllable scene synthesis. • Extensive experiments manifest that our DimensionX delivers superior performance in video, 3D, and 4D generation compared with baseline methods.

MrNeRF

17,028 次观看 • 1 年前

Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation paper page: Large text-to-image diffusion models have exhibited impressive proficiency in generating high-quality images. However, when applying these models to video domain, ensuring temporal consistency across video frames remains a formidable challenge. This paper proposes a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos. The framework includes two parts: key frame translation and full video translation. The first part uses an adapted diffusion model to generate key frames, with hierarchical cross-frame constraints applied to enforce coherence in shapes, textures and colors. The second part propagates the key frames to other frames with temporal-aware patch matching and frame blending. Our framework achieves global style and local texture temporal consistency at a low cost (without re-training or optimization). The adaptation is compatible with existing image diffusion techniques, allowing our framework to take advantage of them, such as customizing a specific subject with LoRA, and introducing extra spatial guidance with ControlNet. Extensive experimental results demonstrate the effectiveness of our proposed framework over existing methods in rendering high-quality and temporally-coherent videos.

Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation paper page: Large text-to-image diffusion models have exhibited impressive proficiency in generating high-quality images. However, when applying these models to video domain, ensuring temporal consistency across video frames remains a formidable challenge. This paper proposes a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos. The framework includes two parts: key frame translation and full video translation. The first part uses an adapted diffusion model to generate key frames, with hierarchical cross-frame constraints applied to enforce coherence in shapes, textures and colors. The second part propagates the key frames to other frames with temporal-aware patch matching and frame blending. Our framework achieves global style and local texture temporal consistency at a low cost (without re-training or optimization). The adaptation is compatible with existing image diffusion techniques, allowing our framework to take advantage of them, such as customizing a specific subject with LoRA, and introducing extra spatial guidance with ControlNet. Extensive experimental results demonstrate the effectiveness of our proposed framework over existing methods in rendering high-quality and temporally-coherent videos.

AK

375,080 次观看 • 3 年前

Is Google taking initial steps to enhance Street View? For some reason, Street View seems stuck in technology that feels outdated. I wonder if we'll see such improvements on the product side. Also, note how much better it performs in all aspects compared to Zip-NeRF in their presented material. It offers more details and fewer artifacts. Great work! "LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering" Contributions: • We propose a novel LOD representation for 3DGS which, unlike previous methods [27, 28, 17], does not recompute the list of used Gaussians at each frame. This allows for acceleration and compaction, enabling the rendering of large-scale scenes even on mobile devices. • We design a strategy to automatically select optimal hyperparameters for splitting LODs, whereas most other methods require manual tuning of hyperparameters for each 3D scene. • To further accelerate rendering, we split the scene into chunks and pre-compute sets of active Gaussians per chunk. • Finally, we introduce a novel opacity interpolation scheme to produce visually pleasing rendering and eliminate artifacts when transitioning between chunks.

Is Google taking initial steps to enhance Street View? For some reason, Street View seems stuck in technology that feels outdated. I wonder if we'll see such improvements on the product side. Also, note how much better it performs in all aspects compared to Zip-NeRF in their presented material. It offers more details and fewer artifacts. Great work! "LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering" Contributions: • We propose a novel LOD representation for 3DGS which, unlike previous methods [27, 28, 17], does not recompute the list of used Gaussians at each frame. This allows for acceleration and compaction, enabling the rendering of large-scale scenes even on mobile devices. • We design a strategy to automatically select optimal hyperparameters for splitting LODs, whereas most other methods require manual tuning of hyperparameters for each 3D scene. • To further accelerate rendering, we split the scene into chunks and pre-compute sets of active Gaussians per chunk. • Finally, we introduce a novel opacity interpolation scheme to produce visually pleasing rendering and eliminate artifacts when transitioning between chunks.

MrNeRF

62,511 次观看 • 1 年前

First fully ML-framework-free 3D Gaussian Splatting implementation in LichtFeld Studio. I’ve completed the migration of the full training pipeline to a custom CUDA-based tensor library. No PyTorch, no LibTorch, no autograd. Every gradient is implemented by hand, either through CUDA kernels or minimal abstractions on top. This makes it the first full training setup for 3D Gaussian Splatting with zero dependencies on existing ML frameworks. It’s not just about independence, it's about control! We now manage every byte of GPU memory, which opens the door to tighter optimization and finer performance tuning. The framework footprint is minimal, without pulling in gigabytes of ML runtime code that was never designed for real-time or graphics-driven applications. A few modules, such as the metrics and 3DGUT interfaces, are still being ported, and some operations are temporarily naïve, so performance is not yet on par with master. But this refactor lays the groundwork for: - A fully self-contained binary - Fine-grained memory optimization - Easier experimentation without the weight of an ML stack We’re getting close.

First fully ML-framework-free 3D Gaussian Splatting implementation in LichtFeld Studio. I’ve completed the migration of the full training pipeline to a custom CUDA-based tensor library. No PyTorch, no LibTorch, no autograd. Every gradient is implemented by hand, either through CUDA kernels or minimal abstractions on top. This makes it the first full training setup for 3D Gaussian Splatting with zero dependencies on existing ML frameworks. It’s not just about independence, it's about control! We now manage every byte of GPU memory, which opens the door to tighter optimization and finer performance tuning. The framework footprint is minimal, without pulling in gigabytes of ML runtime code that was never designed for real-time or graphics-driven applications. A few modules, such as the metrics and 3DGUT interfaces, are still being ported, and some operations are temporarily naïve, so performance is not yet on par with master. But this refactor lays the groundwork for: - A fully self-contained binary - Fine-grained memory optimization - Easier experimentation without the weight of an ML stack We’re getting close.

MrNeRF

50,487 次观看 • 7 个月前

GSTAR: Gaussian Surface Tracking and Reconstruction Contributions: • A new framework for tracking and reconstructing dynamic scenes, combining 3D Gaussians and meshes to effectively manage changes in topology. • A method for Gaussian unbinding and surface re-meshing, allowing for the generation of new surfaces as topologies evolve. • A method for handling large or fast deformations of surfaces between frames using scene flow warping. Abstract (excerpt): However, tracking dynamic surfaces with 3D Gaussians remains challenging due to complex topology changes, such as surfaces appearing, disappearing, or splitting. To address these challenges, we propose GSTAR, a novel method that achieves photo-realistic rendering, accurate surface reconstruction, and reliable 3D tracking for general dynamic scenes with changing topology. Given multi-view captures as input, GSTAR binds Gaussians to mesh faces to represent dynamic objects. For surfaces with consistent topology, GSTAR maintains the mesh topology and tracks the meshes using Gaussians.

GSTAR: Gaussian Surface Tracking and Reconstruction Contributions: • A new framework for tracking and reconstructing dynamic scenes, combining 3D Gaussians and meshes to effectively manage changes in topology. • A method for Gaussian unbinding and surface re-meshing, allowing for the generation of new surfaces as topologies evolve. • A method for handling large or fast deformations of surfaces between frames using scene flow warping. Abstract (excerpt): However, tracking dynamic surfaces with 3D Gaussians remains challenging due to complex topology changes, such as surfaces appearing, disappearing, or splitting. To address these challenges, we propose GSTAR, a novel method that achieves photo-realistic rendering, accurate surface reconstruction, and reliable 3D tracking for general dynamic scenes with changing topology. Given multi-view captures as input, GSTAR binds Gaussians to mesh faces to represent dynamic objects. For surfaces with consistent topology, GSTAR maintains the mesh topology and tracks the meshes using Gaussians.

MrNeRF

22,698 次观看 • 1 年前

SpatialTracker Tracking Any 2D Pixels in 3D Space Recovering dense and long-range pixel motion in videos is a challenging problem. Part of the difficulty arises from the 3D-to-2D projection process, leading to occlusions and discontinuities in the 2D motion domain.

SpatialTracker Tracking Any 2D Pixels in 3D Space Recovering dense and long-range pixel motion in videos is a challenging problem. Part of the difficulty arises from the 3D-to-2D projection process, leading to occlusions and discontinuities in the 2D motion domain.

AK

77,162 次观看 • 2 年前

$📢Announcing our 3D head avatar benchmark📢 Two tasks with hidden test sets: - Dynamic Novel View Synthesis on Heads - Monocular FLAME-driven Head Avatar Reconstruction Our goal is to make research on 3D head avatars more comparable and ultimately increase the realism of digital humans. The benchmark studies distinct phenomena of 3D head avatar creation, such as extreme facial expressions, slow motion captures of shaking long hair, or complicated light reflection and refraction patterns of glasses. The two benchmark tasks assess two core desiderata of 3D avatars: While the novel view synthesis challenge focuses on best possible rendering quality of complex moving scenes, the avatar animation challenge is concerned with how well a driving signal is translated into an avatar. Evaluations are light-weight and consist of diverse video recordings from the popular NeRSemble dataset with a hidden test set. Participation in the benchmark is therefore straight-forward and requires only 5 reconstructions per task. Leaderboard and benchmark submission: Benchmark data access and toolkit: Great work by Tobias Kirschstein Simon Giebenhain$

📢Announcing our 3D head avatar benchmark📢 Two tasks with hidden test sets: - Dynamic Novel View Synthesis on Heads - Monocular FLAME-driven Head Avatar Reconstruction Our goal is to make research on 3D head avatars more comparable and ultimately increase the realism of digital humans. The benchmark studies distinct phenomena of 3D head avatar creation, such as extreme facial expressions, slow motion captures of shaking long hair, or complicated light reflection and refraction patterns of glasses. The two benchmark tasks assess two core desiderata of 3D avatars: While the novel view synthesis challenge focuses on best possible rendering quality of complex moving scenes, the avatar animation challenge is concerned with how well a driving signal is translated into an avatar. Evaluations are light-weight and consist of diverse video recordings from the popular NeRSemble dataset with a hidden test set. Participation in the benchmark is therefore straight-forward and requires only 5 reconstructions per task. Leaderboard and benchmark submission: Benchmark data access and toolkit: Great work by Tobias Kirschstein Simon Giebenhain

Matthias Niessner

28,062 次观看 • 1 年前

A mega vessel perfectly aligned alongside the berth is more than just a logistical feat—it is a clear demonstration of scale, accuracy, and operational excellence. From the exact positioning to the sheer magnitude of the vessel, this reflects the high-level coordination, meticulous planning, and rigorous control required in today’s maritime industry. In this environment, every movement is deliberate, and every adjustment is executed with expert care. Moments like this prove that seamless operations are never a coincidence; they are the direct result of deep expertise, disciplined teamwork, and an unwavering attention to detail.

A mega vessel perfectly aligned alongside the berth is more than just a logistical feat—it is a clear demonstration of scale, accuracy, and operational excellence. From the exact positioning to the sheer magnitude of the vessel, this reflects the high-level coordination, meticulous planning, and rigorous control required in today’s maritime industry. In this environment, every movement is deliberate, and every adjustment is executed with expert care. Moments like this prove that seamless operations are never a coincidence; they are the direct result of deep expertise, disciplined teamwork, and an unwavering attention to detail.

The Maritime

11,337 次观看 • 1 个月前

i'm writing my own gui framework in c++ and rewriting my image editor using it, which is going good so far, i wanted to move from swiftui, qt, tauri, electron, react native, and flutter so i can use metal and directx gpu apis on macos and windows as much as possible,so i think going full metal/directx is the way to go i know i have a very hard and long way to go though as there are many many flaws,bugs in this very very early version and doesnt utilize the gpu as much as i want to but i hope we will get there soon ,i also need to make the ui look somewhat decent instead of whatever mess it is right now

i'm writing my own gui framework in c++ and rewriting my image editor using it, which is going good so far, i wanted to move from swiftui, qt, tauri, electron, react native, and flutter so i can use metal and directx gpu apis on macos and windows as much as possible,so i think going full metal/directx is the way to go i know i have a very hard and long way to go though as there are many many flaws,bugs in this very very early version and doesnt utilize the gpu as much as i want to but i hope we will get there soon ,i also need to make the ui look somewhat decent instead of whatever mess it is right now

Ruben Veidt

81,189 次观看 • 1 年前

Colmap 4.0 was very recently released, so it inspired me to do some work to better understand it and its new capabilities with Rerun. I want to really understand how Colmap, and in particular, pycolmap, works outside of just calling it via the CLI. So my goal is to use the low-level pycolmap API to log every part of the pipeline. The explicit goal is to have an alternative to the SQLite database that I can utilize. Instead of SQLite, I want to try logging everything directly to rerun and use RRD. This means I can have deep inspectability and still save the features/matches/2D view geometry, but be able to view it directly in rerun. I think this is one of the superpowers that rerun provides; data and visualizations are deeply integrated. As I'm often working with sequential data (videos), I'm going to specifically focus on four things: 1. Monocular Video Simple: Calls high-level APIs such as pycolmap.extract_features, pycolmap.match_sequential, pycolmap.incremental_mapping. These are basically identical to the CLI options and provide a good baseline. 2. Monocular Video Streamed: Take the above high-level APIs and break them down to their iterator version, logging each component in a streamed manner. This way, I can stream the intermediate features to rerun while the extraction/matching/mapping is happening. 3. Rig with unknown calibration: <- WHAT THE VIDEO SHOWS This is probably the most interesting version and the first one I've been working on. It allows one to set a rig between known sensors, such as in VR/AR devices, leading to much better reconstructions with multiple cameras. This is the case where we don't know the calibration a priori, so we have to run a reconstruction twice: once as a normal Colmap reconstruction with no rig constraints, use this to generate the constraints, and then do it again with the newly found rig. 4. Rig with known calibration: This is the RoboCap example, where we have a pre-calibrated set of sensors, so we don't need to run the two reconstructions and also gain better matching between cameras, both spatially and temporally. Again, this leads to a much better reconstruction! Along with all this, GLOMAP has become a first-class global mapper, making it super easy to use directly within pycolmap! I'm excited to do more with this and compare it to things like pycuvslam, vipe, and other alternatives.

Colmap 4.0 was very recently released, so it inspired me to do some work to better understand it and its new capabilities with Rerun. I want to really understand how Colmap, and in particular, pycolmap, works outside of just calling it via the CLI. So my goal is to use the low-level pycolmap API to log every part of the pipeline. The explicit goal is to have an alternative to the SQLite database that I can utilize. Instead of SQLite, I want to try logging everything directly to rerun and use RRD. This means I can have deep inspectability and still save the features/matches/2D view geometry, but be able to view it directly in rerun. I think this is one of the superpowers that rerun provides; data and visualizations are deeply integrated. As I'm often working with sequential data (videos), I'm going to specifically focus on four things: 1. Monocular Video Simple: Calls high-level APIs such as pycolmap.extract_features, pycolmap.match_sequential, pycolmap.incremental_mapping. These are basically identical to the CLI options and provide a good baseline. 2. Monocular Video Streamed: Take the above high-level APIs and break them down to their iterator version, logging each component in a streamed manner. This way, I can stream the intermediate features to rerun while the extraction/matching/mapping is happening. 3. Rig with unknown calibration: <- WHAT THE VIDEO SHOWS This is probably the most interesting version and the first one I've been working on. It allows one to set a rig between known sensors, such as in VR/AR devices, leading to much better reconstructions with multiple cameras. This is the case where we don't know the calibration a priori, so we have to run a reconstruction twice: once as a normal Colmap reconstruction with no rig constraints, use this to generate the constraints, and then do it again with the newly found rig. 4. Rig with known calibration: This is the RoboCap example, where we have a pre-calibrated set of sensors, so we don't need to run the two reconstructions and also gain better matching between cameras, both spatially and temporally. Again, this leads to a much better reconstruction! Along with all this, GLOMAP has become a first-class global mapper, making it super easy to use directly within pycolmap! I'm excited to do more with this and compare it to things like pycuvslam, vipe, and other alternatives.

Pablo Vela

30,070 次观看 • 2 个月前

Human3R: Everyone Everywhere All at Once Note: I recorded the video from the interactive demo on their project page (linked in the comment below). Abstract (excerpt): Human3R jointly recovers global multi-person SMPL-X bodies ("everyone"), dense 3D scenes ("everywhere"), and camera trajectories in a single forward pass ("all-at-once"). Our method builds upon the 4D online reconstruction model CUT3R and uses parameter-efficient visual prompt tuning to preserve CUT3R's rich spatiotemporal priors while enabling direct readout of multiple SMPL-X bodies. Human3R is a unified model that eliminates heavy dependencies and iterative refinement. After being trained on the relatively small-scale synthetic dataset BEDLAM for just one day on one GPU, it achieves superior performance with remarkable efficiency: it reconstructs multiple humans in a one-shot manner, along with 3D scenes, in one stage, at real-time speed (15 FPS) with a low memory footprint (8 GB).

Human3R: Everyone Everywhere All at Once Note: I recorded the video from the interactive demo on their project page (linked in the comment below). Abstract (excerpt): Human3R jointly recovers global multi-person SMPL-X bodies ("everyone"), dense 3D scenes ("everywhere"), and camera trajectories in a single forward pass ("all-at-once"). Our method builds upon the 4D online reconstruction model CUT3R and uses parameter-efficient visual prompt tuning to preserve CUT3R's rich spatiotemporal priors while enabling direct readout of multiple SMPL-X bodies. Human3R is a unified model that eliminates heavy dependencies and iterative refinement. After being trained on the relatively small-scale synthetic dataset BEDLAM for just one day on one GPU, it achieves superior performance with remarkable efficiency: it reconstructs multiple humans in a one-shot manner, along with 3D scenes, in one stage, at real-time speed (15 FPS) with a low memory footprint (8 GB).

MrNeRF

35,783 次观看 • 8 个月前

DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery Abstract: Drones have become essential tools for reconstructing wild scenes due to their outstanding maneuverability. Recent advances in radiance field methods have achieved remarkable rendering quality, providing a new avenue for 3D reconstruction from drone imagery. However, dynamic distractors in wild environments challenge the static scene assumption in radiance fields, while limited view constraints hinder the accurate capture of underlying scene geometry. To address these challenges, we introduce DroneSplat, a novel framework designed for robust 3D reconstruction from in-the-wild drone imagery. Our method adaptively adjusts masking thresholds by integrating local-global segmentation heuristics with statistical approaches, enabling precise identification and elimination of dynamic distractors in static scenes. We enhance 3D Gaussian Splatting with multi-view stereo predictions and a voxel-guided optimization strategy, supporting high-quality rendering under limited view constraints. For comprehensive evaluation, we provide a drone-captured 3D reconstruction dataset encompassing both dynamic and static scenes. Extensive experiments demonstrate that DroneSplat outperforms both 3DGS and NeRF baselines in handling in-the-wild drone imagery.

DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery Abstract: Drones have become essential tools for reconstructing wild scenes due to their outstanding maneuverability. Recent advances in radiance field methods have achieved remarkable rendering quality, providing a new avenue for 3D reconstruction from drone imagery. However, dynamic distractors in wild environments challenge the static scene assumption in radiance fields, while limited view constraints hinder the accurate capture of underlying scene geometry. To address these challenges, we introduce DroneSplat, a novel framework designed for robust 3D reconstruction from in-the-wild drone imagery. Our method adaptively adjusts masking thresholds by integrating local-global segmentation heuristics with statistical approaches, enabling precise identification and elimination of dynamic distractors in static scenes. We enhance 3D Gaussian Splatting with multi-view stereo predictions and a voxel-guided optimization strategy, supporting high-quality rendering under limited view constraints. For comprehensive evaluation, we provide a drone-captured 3D reconstruction dataset encompassing both dynamic and static scenes. Extensive experiments demonstrate that DroneSplat outperforms both 3DGS and NeRF baselines in handling in-the-wild drone imagery.

MrNeRF

21,340 次观看 • 1 年前

A giant canyon on the Sun On July 15, a powerful flare occurred on the surface of the Sun, which briefly changed its structure and caused the release of hot plasma into space. As a result of this event, a plasma "scar" about 400,000 kilometers long appeared. This is approximately the distance from the Earth to the Moon. The height of the structure reaches 20,000 kilometers. All this was recorded with high accuracy by NASA's Solar Dynamics Observatory (SDO).

A giant canyon on the Sun On July 15, a powerful flare occurred on the surface of the Sun, which briefly changed its structure and caused the release of hot plasma into space. As a result of this event, a plasma "scar" about 400,000 kilometers long appeared. This is approximately the distance from the Earth to the Moon. The height of the structure reaches 20,000 kilometers. All this was recorded with high accuracy by NASA's Solar Dynamics Observatory (SDO).

Black Hole

15,948 次观看 • 11 个月前

During a meeting with President of Türkiye Recep Tayyip Erdoğan in Istanbul, we discussed bilateral relations between our countries, as well as the situation in Europe and the Middle East. It is important that joint and coordinated actions strengthen the protection of life and help deliver greater security to people in every part of the world. We agreed on new steps in security cooperation. This primarily concerns areas where we can support Türkiye – expertise, technology, and experience. There is firm political readiness to work together, and our teams will finalize the details in the coming days. We discussed practical steps to implement joint projects in developing gas infrastructure, as well as opportunities for joint development of gas fields. I am grateful to the President and the people of Türkiye for their consistent support for our independence and territorial integrity. We value our close cooperation over all these years, which enables us to work on truly significant projects that can strengthen our entire region.

During a meeting with President of Türkiye Recep Tayyip Erdoğan in Istanbul, we discussed bilateral relations between our countries, as well as the situation in Europe and the Middle East. It is important that joint and coordinated actions strengthen the protection of life and help deliver greater security to people in every part of the world. We agreed on new steps in security cooperation. This primarily concerns areas where we can support Türkiye – expertise, technology, and experience. There is firm political readiness to work together, and our teams will finalize the details in the coming days. We discussed practical steps to implement joint projects in developing gas infrastructure, as well as opportunities for joint development of gas fields. I am grateful to the President and the people of Türkiye for their consistent support for our independence and territorial integrity. We value our close cooperation over all these years, which enables us to work on truly significant projects that can strengthen our entire region.

Volodymyr Zelenskyy / Володимир Зеленський

286,671 次观看 • 2 个月前