Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Meta releases VGGSfM Visual Geometry Grounded Deep Structure From Motion Structure-from-motion (SfM) is a long-standing problem in the computer vision community, which aims to reconstruct the camera poses and 3D structure of a scene from a set of unconstrained 2D images. Classical frameworks solve this problem in an incremental... manner by detecting and matching keypoints, registering images, triangulating 3D points, and conducting bundle adjustment. Recent research efforts have predominantly revolved around harnessing the power of deep learning techniques to enhance specific elements (e.g., keypoint matching), but are still based on the original, non-differentiable pipeline. Instead, we propose a new deep SfM pipeline VGGSfM, where each component is fully differentiable and thus can be trained in an end-to-end manner. To this end, we introduce new mechanisms and simplifications. First, we build on recent advances in deep 2D point tracking to extract reliable pixel-accurate tracks, which eliminates the need for chaining pairwise matches. Furthermore, we recover all cameras simultaneously based on the image and track features instead of gradually registering cameras. Finally, we optimise the cameras and triangulate 3D points via a differentiable bundle adjustment layer. We attain state-of-the-art performance on three popular datasets, CO3D, IMC Phototourism, and ETH3D.show more

AK

504,347 subscribers

96,527 просмотров • 2 лет назад •via X (Twitter)

Новости и политика Образование Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

FastMap: Revisiting Dense and Scalable Structure from Motion "FASTMAP, a redesigned SfM framework, achieves fast, high-accuracy dense structure from motion. On large scenes with thousands of images, FASTMAP is up to one to two orders of magnitude faster than GLOMAP and COLMAP. ... Importantly, FASTMAP achieves efficiency improvements while keeping comparable performance. Extensive experiments on eight datasets demonstrate pose estimation accuracy and novel view synthesis quality close to GLOMAP and COLMAP. " Contributions: 1. For all the iterative nonlinear optimization problems involved, we design algorithms such that the computational complexity of each iteration is only linear in the number of image pairs, not keypoint pairs or 3D points. This includes replacing the traditional bundle adjustment [50] present in previous SfM frameworks with a novel re-weighting epipolar adjustment algorithm, which is much more efficient. 2. Throughout the entire framework, we formulate as many steps as possible as GPU-friendly dense tensor operations. This allows us to implement the entire method in PyTorch [39], which provides seamless GPU acceleration.

FastMap: Revisiting Dense and Scalable Structure from Motion "FASTMAP, a redesigned SfM framework, achieves fast, high-accuracy dense structure from motion. On large scenes with thousands of images, FASTMAP is up to one to two orders of magnitude faster than GLOMAP and COLMAP. ... Importantly, FASTMAP achieves efficiency improvements while keeping comparable performance. Extensive experiments on eight datasets demonstrate pose estimation accuracy and novel view synthesis quality close to GLOMAP and COLMAP. " Contributions: 1. For all the iterative nonlinear optimization problems involved, we design algorithms such that the computational complexity of each iteration is only linear in the number of image pairs, not keypoint pairs or 3D points. This includes replacing the traditional bundle adjustment [50] present in previous SfM frameworks with a novel re-weighting epipolar adjustment algorithm, which is much more efficient. 2. Throughout the entire framework, we formulate as many steps as possible as GPU-friendly dense tensor operations. This allows us to implement the entire method in PyTorch [39], which provides seamless GPU acceleration.

MrNeRF

15,233 просмотров • 1 год назад

📢Pix2NPHM: Learning to Regress NPHM Reconstructions From a Single Image📢 We directly regress neural parametric head models (NPHMs) from a single image — fast, stable, and significantly more expressive than classical 3DMMs such as FLAME. Face tracking & 3D reconstruction are often limited by the representational capacity of PCA-based face models. By lifting NPHMs to a first-class reconstruction primitive, we enable more accurate geometry, richer expressions, and finer animation control. Pix2NPHM obtains fast and reliable NPHM reconstructions on real-world data. Inference-time optimization against surface normals and canonical point maps can further increase fidelity. Key to successful and generalized training of our ViT-based network are: (1) large-scale registration of existing 3D head datasets, and (2) self-supervised training on vast in-the-wild 2D video datasets using pseudo ground-truth surface normals. Finally, we show that geometry-aware pretraining on pixel-aligned reconstruction tasks significantly outperforms generic visual pretraining (e.g., DINO-style features) in terms of generalization. 🌍 🎥 Great work by Simon Giebenhain, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Zhe Chen

📢Pix2NPHM: Learning to Regress NPHM Reconstructions From a Single Image📢 We directly regress neural parametric head models (NPHMs) from a single image — fast, stable, and significantly more expressive than classical 3DMMs such as FLAME. Face tracking & 3D reconstruction are often limited by the representational capacity of PCA-based face models. By lifting NPHMs to a first-class reconstruction primitive, we enable more accurate geometry, richer expressions, and finer animation control. Pix2NPHM obtains fast and reliable NPHM reconstructions on real-world data. Inference-time optimization against surface normals and canonical point maps can further increase fidelity. Key to successful and generalized training of our ViT-based network are: (1) large-scale registration of existing 3D head datasets, and (2) self-supervised training on vast in-the-wild 2D video datasets using pseudo ground-truth surface normals. Finally, we show that geometry-aware pretraining on pixel-aligned reconstruction tasks significantly outperforms generic visual pretraining (e.g., DINO-style features) in terms of generalization. 🌍 🎥 Great work by Simon Giebenhain, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Zhe Chen

Matthias Niessner

37,850 просмотров • 7 месяцев назад

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers paper page: Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control. State-of-the-art approaches predominantly rely on diffusion models to accomplish these tasks. However, the computational demands of diffusion-based methods are substantial, often necessitating large-scale paired datasets for training, and therefore challenging the deployment in practical applications. This study addresses this challenge by breaking down the text-based video editing process into two separate stages. In the first stage, we leverage an existing text-to-image diffusion model to simultaneously edit a few keyframes without additional fine-tuning. In the second stage, we introduce an efficient model called MaskINT, which is built on non-autoregressive masked generative transformers and specializes in frame interpolation between the keyframes, benefiting from structural guidance provided by intermediate frames. Our comprehensive set of experiments illustrates the efficacy and efficiency of MaskINT when compared to other diffusion-based methodologies. This research offers a practical solution for text-based video editing and showcases the potential of non-autoregressive masked generative transformers in this domain.

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers paper page: Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control. State-of-the-art approaches predominantly rely on diffusion models to accomplish these tasks. However, the computational demands of diffusion-based methods are substantial, often necessitating large-scale paired datasets for training, and therefore challenging the deployment in practical applications. This study addresses this challenge by breaking down the text-based video editing process into two separate stages. In the first stage, we leverage an existing text-to-image diffusion model to simultaneously edit a few keyframes without additional fine-tuning. In the second stage, we introduce an efficient model called MaskINT, which is built on non-autoregressive masked generative transformers and specializes in frame interpolation between the keyframes, benefiting from structural guidance provided by intermediate frames. Our comprehensive set of experiments illustrates the efficacy and efficiency of MaskINT when compared to other diffusion-based methodologies. This research offers a practical solution for text-based video editing and showcases the potential of non-autoregressive masked generative transformers in this domain.

AK

25,449 просмотров • 2 лет назад

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

MrNeRF

52,849 просмотров • 1 год назад

I met with representatives of the American think tank Hudson Institute and informed them about the consequences of Russian strikes in recent days. In light of these attacks, it is particularly important to bolster Ukraine’s air defense, and we are grateful to the United States of America and to all caring Americans for their support of Ukraine, which we have felt throughout all the years of this full-scale Russian aggression. During the meeting, we discussed cooperation with the U.S. Congress, diplomatic work to achieve a dignified peace, and the importance of active U.S. participation in the negotiation process. We are counting on an end to the pause in the negotiations and the reinvigoration of diplomacy. This is important for saving the lives of our people and restoring security in Europe.

I met with representatives of the American think tank Hudson Institute and informed them about the consequences of Russian strikes in recent days. In light of these attacks, it is particularly important to bolster Ukraine’s air defense, and we are grateful to the United States of America and to all caring Americans for their support of Ukraine, which we have felt throughout all the years of this full-scale Russian aggression. During the meeting, we discussed cooperation with the U.S. Congress, diplomatic work to achieve a dignified peace, and the importance of active U.S. participation in the negotiation process. We are counting on an end to the pause in the negotiations and the reinvigoration of diplomacy. This is important for saving the lives of our people and restoring security in Europe.

Volodymyr Zelenskyy / Володимир Зеленський

147,009 просмотров • 2 месяцев назад

Colmap 4.0 was very recently released, so it inspired me to do some work to better understand it and its new capabilities with Rerun. I want to really understand how Colmap, and in particular, pycolmap, works outside of just calling it via the CLI. So my goal is to use the low-level pycolmap API to log every part of the pipeline. The explicit goal is to have an alternative to the SQLite database that I can utilize. Instead of SQLite, I want to try logging everything directly to rerun and use RRD. This means I can have deep inspectability and still save the features/matches/2D view geometry, but be able to view it directly in rerun. I think this is one of the superpowers that rerun provides; data and visualizations are deeply integrated. As I'm often working with sequential data (videos), I'm going to specifically focus on four things: 1. Monocular Video Simple: Calls high-level APIs such as pycolmap.extract_features, pycolmap.match_sequential, pycolmap.incremental_mapping. These are basically identical to the CLI options and provide a good baseline. 2. Monocular Video Streamed: Take the above high-level APIs and break them down to their iterator version, logging each component in a streamed manner. This way, I can stream the intermediate features to rerun while the extraction/matching/mapping is happening. 3. Rig with unknown calibration: <- WHAT THE VIDEO SHOWS This is probably the most interesting version and the first one I've been working on. It allows one to set a rig between known sensors, such as in VR/AR devices, leading to much better reconstructions with multiple cameras. This is the case where we don't know the calibration a priori, so we have to run a reconstruction twice: once as a normal Colmap reconstruction with no rig constraints, use this to generate the constraints, and then do it again with the newly found rig. 4. Rig with known calibration: This is the RoboCap example, where we have a pre-calibrated set of sensors, so we don't need to run the two reconstructions and also gain better matching between cameras, both spatially and temporally. Again, this leads to a much better reconstruction! Along with all this, GLOMAP has become a first-class global mapper, making it super easy to use directly within pycolmap! I'm excited to do more with this and compare it to things like pycuvslam, vipe, and other alternatives.

Colmap 4.0 was very recently released, so it inspired me to do some work to better understand it and its new capabilities with Rerun. I want to really understand how Colmap, and in particular, pycolmap, works outside of just calling it via the CLI. So my goal is to use the low-level pycolmap API to log every part of the pipeline. The explicit goal is to have an alternative to the SQLite database that I can utilize. Instead of SQLite, I want to try logging everything directly to rerun and use RRD. This means I can have deep inspectability and still save the features/matches/2D view geometry, but be able to view it directly in rerun. I think this is one of the superpowers that rerun provides; data and visualizations are deeply integrated. As I'm often working with sequential data (videos), I'm going to specifically focus on four things: 1. Monocular Video Simple: Calls high-level APIs such as pycolmap.extract_features, pycolmap.match_sequential, pycolmap.incremental_mapping. These are basically identical to the CLI options and provide a good baseline. 2. Monocular Video Streamed: Take the above high-level APIs and break them down to their iterator version, logging each component in a streamed manner. This way, I can stream the intermediate features to rerun while the extraction/matching/mapping is happening. 3. Rig with unknown calibration: <- WHAT THE VIDEO SHOWS This is probably the most interesting version and the first one I've been working on. It allows one to set a rig between known sensors, such as in VR/AR devices, leading to much better reconstructions with multiple cameras. This is the case where we don't know the calibration a priori, so we have to run a reconstruction twice: once as a normal Colmap reconstruction with no rig constraints, use this to generate the constraints, and then do it again with the newly found rig. 4. Rig with known calibration: This is the RoboCap example, where we have a pre-calibrated set of sensors, so we don't need to run the two reconstructions and also gain better matching between cameras, both spatially and temporally. Again, this leads to a much better reconstruction! Along with all this, GLOMAP has become a first-class global mapper, making it super easy to use directly within pycolmap! I'm excited to do more with this and compare it to things like pycuvslam, vipe, and other alternatives.

Pablo Vela

30,070 просмотров • 4 месяцев назад

The U.S. Special Presidential Envoy for Ukraine, General Keith Kellogg Keith Kellogg, is in Kyiv today. I am grateful for a constructive meeting. We discussed various vectors of cooperation – how to achieve real peace and guarantee Ukraine’s security. These include projects within the PURL initiative for financing production and procurement of Patriot systems, strong bilateral agreements on co-production of drones and weapons that we have proposed to America. We count on a positive response from the United States. We had a substantive discussion on stepping up pressure on the Russians and what we can do together with partners in tariff and sanctions policy to enable a meeting at the leaders’ level at the earliest and bring this war to an end. A trilateral leaders’ format is undoubtedly the most effective. We also discussed the return of abducted Ukrainian children, international cooperation on this track, and the conditions in which our children are being held. I expressed condolences to the American people over the horrific murder of Charlie Kirk and thanked President Trump for his condolences and response to the brutal murder of Ukrainian citizen Iryna Zarutska in North Carolina. It is important that justice prevail every time violence seeks to take hold. We are also preparing for the 80th session of the UN General Assembly in New York. We discussed planned events, coordination between Ukraine and the U.S., and work within the Coalition of the Willing. We are working on potential meetings and various formats.

The U.S. Special Presidential Envoy for Ukraine, General Keith Kellogg Keith Kellogg, is in Kyiv today. I am grateful for a constructive meeting. We discussed various vectors of cooperation – how to achieve real peace and guarantee Ukraine’s security. These include projects within the PURL initiative for financing production and procurement of Patriot systems, strong bilateral agreements on co-production of drones and weapons that we have proposed to America. We count on a positive response from the United States. We had a substantive discussion on stepping up pressure on the Russians and what we can do together with partners in tariff and sanctions policy to enable a meeting at the leaders’ level at the earliest and bring this war to an end. A trilateral leaders’ format is undoubtedly the most effective. We also discussed the return of abducted Ukrainian children, international cooperation on this track, and the conditions in which our children are being held. I expressed condolences to the American people over the horrific murder of Charlie Kirk and thanked President Trump for his condolences and response to the brutal murder of Ukrainian citizen Iryna Zarutska in North Carolina. It is important that justice prevail every time violence seeks to take hold. We are also preparing for the 80th session of the UN General Assembly in New York. We discussed planned events, coordination between Ukraine and the U.S., and work within the Coalition of the Willing. We are working on potential meetings and various formats.

Volodymyr Zelenskyy / Володимир Зеленський

228,172 просмотров • 10 месяцев назад

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion TL;DR: Create 3/4DGS from Video Diffusion Note: Some first inference code released (not all yet). Contributions (cited): • We present DimensionX, a novel framework for generating photorealistic 3D and 4D scenes from only a single image using controllable video diffusion. • We propose ST-Director, which decouples the spatial and temporal priors in video diffusion models by learning (spatial and temporal) dimension-aware modules with our curated datasets. We further enhance the hybriddimension control with a training-free composition approach according to the essence of video diffusion denoising process. • To bridge the gap between video diffusion and real-world scenes, we design a trajectory-aware mechanism for 3D generation and an identity-preserving denoising approach for 4D generation, enabling more realistic and controllable scene synthesis. • Extensive experiments manifest that our DimensionX delivers superior performance in video, 3D, and 4D generation compared with baseline methods.

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion TL;DR: Create 3/4DGS from Video Diffusion Note: Some first inference code released (not all yet). Contributions (cited): • We present DimensionX, a novel framework for generating photorealistic 3D and 4D scenes from only a single image using controllable video diffusion. • We propose ST-Director, which decouples the spatial and temporal priors in video diffusion models by learning (spatial and temporal) dimension-aware modules with our curated datasets. We further enhance the hybriddimension control with a training-free composition approach according to the essence of video diffusion denoising process. • To bridge the gap between video diffusion and real-world scenes, we design a trajectory-aware mechanism for 3D generation and an identity-preserving denoising approach for 4D generation, enabling more realistic and controllable scene synthesis. • Extensive experiments manifest that our DimensionX delivers superior performance in video, 3D, and 4D generation compared with baseline methods.

MrNeRF

17,052 просмотров • 1 год назад

In the summer of 2023, I cold emailed Jensen Huang and asked to capture a NeRF of him at SIGGRAPH. He responded in about an hour and said yes. A radiance field is, in the simplest terms, akin to a 3D photograph. A moment in time, so completely reconstructed that you can move through it and see it from angles the original cameras never occupied. NeRFs were the original method. Gaussian splatting, which debuted at that same SIGGRAPH, has since become the dominant form of radiance field. I called my late friend James, who told me we needed to begin practicing immediately. We ran capture after capture for weeks until we consistently got the capture time down to ~30 seconds with one camera. Later, in a hallway at the LA Convention Center during SIGGRAPH, I captured the portrait you're seeing now, a full 360° gaussian splat of Jensen, rendered here as a 2D flythrough. Afterward, I continued the conversation with him and members of his team to make the case for radiance fields as a foundational representation for imaging. To my surprise, they listened. Three years later, NVIDIA has several works, including NuRec, fVDB, 3DGRUT, and gsplat all utilizing radiance fields. The landscape has evolved enough that the reasoning is obvious. Gaussian splatting has begun to ship across some of the world’s largest industries, including autonomous vehicles, AEC, geospatial, media and entertainment, robotics, e-commerce, hospitality. It’s become clear that lifelike 3D is here to stay. And yet I think we will look back and be disappointed by how late we started taking 3D portraits of the people around us, just like how we have sparse 2D photos of our grandparents and great grandparents. We have billions of photographs of the people we know and love, but almost no radiance fields of them. I'll be returning to SIGGRAPH in LA where this was initially captured three years ago, with the landscape looking significantly different. Radiance fields are more under deployed than ever relative to what they can do. I'm excited for the future of imaging, and for 2D to transition into 3D. I have a few things up my sleeve that I think will make that case plainly.

In the summer of 2023, I cold emailed Jensen Huang and asked to capture a NeRF of him at SIGGRAPH. He responded in about an hour and said yes. A radiance field is, in the simplest terms, akin to a 3D photograph. A moment in time, so completely reconstructed that you can move through it and see it from angles the original cameras never occupied. NeRFs were the original method. Gaussian splatting, which debuted at that same SIGGRAPH, has since become the dominant form of radiance field. I called my late friend James, who told me we needed to begin practicing immediately. We ran capture after capture for weeks until we consistently got the capture time down to ~30 seconds with one camera. Later, in a hallway at the LA Convention Center during SIGGRAPH, I captured the portrait you're seeing now, a full 360° gaussian splat of Jensen, rendered here as a 2D flythrough. Afterward, I continued the conversation with him and members of his team to make the case for radiance fields as a foundational representation for imaging. To my surprise, they listened. Three years later, NVIDIA has several works, including NuRec, fVDB, 3DGRUT, and gsplat all utilizing radiance fields. The landscape has evolved enough that the reasoning is obvious. Gaussian splatting has begun to ship across some of the world’s largest industries, including autonomous vehicles, AEC, geospatial, media and entertainment, robotics, e-commerce, hospitality. It’s become clear that lifelike 3D is here to stay. And yet I think we will look back and be disappointed by how late we started taking 3D portraits of the people around us, just like how we have sparse 2D photos of our grandparents and great grandparents. We have billions of photographs of the people we know and love, but almost no radiance fields of them. I'll be returning to SIGGRAPH in LA where this was initially captured three years ago, with the landscape looking significantly different. Radiance fields are more under deployed than ever relative to what they can do. I'm excited for the future of imaging, and for 2D to transition into 3D. I have a few things up my sleeve that I think will make that case plainly.

Radiance Fields

17,663 просмотров • 1 месяц назад

Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction Note: Check below for full video. Abstract (cited): "In this paper, we present a self-calibrating framework that jointly optimizes camera parameters, lens distortion, and 3D Gaussian representations, enabling accurate and efficient scene reconstruction. Our technique is particularly effective for high-quality scene reconstruction from large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of images. We introduce a novel method for modeling complex lens distortions using a hybrid network that combines invertible residual networks with explicit grids. This design effectively regularizes the optimization process, achieving greater accuracy than conventional camera models. Additionally, we propose a cubemap-based resampling strategy to support large FOV images without sacrificing resolution or introducing distortion artifacts. Our method is compatible with the fast rasterization of Gaussian Splatting, adaptable to a wide variety of camera lens distortions, and demonstrates state-of-the-art performance on both synthetic and real-world datasets."

Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction Note: Check below for full video. Abstract (cited): "In this paper, we present a self-calibrating framework that jointly optimizes camera parameters, lens distortion, and 3D Gaussian representations, enabling accurate and efficient scene reconstruction. Our technique is particularly effective for high-quality scene reconstruction from large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of images. We introduce a novel method for modeling complex lens distortions using a hybrid network that combines invertible residual networks with explicit grids. This design effectively regularizes the optimization process, achieving greater accuracy than conventional camera models. Additionally, we propose a cubemap-based resampling strategy to support large FOV images without sacrificing resolution or introducing distortion artifacts. Our method is compatible with the fast rasterization of Gaussian Splatting, adaptable to a wide variety of camera lens distortions, and demonstrates state-of-the-art performance on both synthetic and real-world datasets."

MrNeRF

17,206 просмотров • 1 год назад

✨ Made a new mini feature on Photo AI: [ Grab from 3d model ] So the problem is we're at that stage in time (typical for AI) where image-to-3d models are not good enough but are fun to play with, but we know they'll be good enough in 1-2 years With [ Make 3d model ] you already can turn any Photo AI pic into a 3d model but it still looks hyper clunky and deformed, but it works! One cool idea I had to make that more useful and made now: Let people make a 3d model then change the view of the it with the 3d viewer, then press [ o ] and it grabs a frame of the 3d That image you can then [ Remix ] (img2img), and it becomes a real photo again and that in turn you can then turn into a video again with [ Make video ] So that essentially gives you a fully freeform camera position control to take photos with One thing I need to fix is the background/skybox, I kinda need to take the original photo and remove the person and just get the background for the 3d model viewer, in this case it should be white, but it's a start!

✨ Made a new mini feature on Photo AI: [ Grab from 3d model ] So the problem is we're at that stage in time (typical for AI) where image-to-3d models are not good enough but are fun to play with, but we know they'll be good enough in 1-2 years With [ Make 3d model ] you already can turn any Photo AI pic into a 3d model but it still looks hyper clunky and deformed, but it works! One cool idea I had to make that more useful and made now: Let people make a 3d model then change the view of the it with the 3d viewer, then press [ o ] and it grabs a frame of the 3d That image you can then [ Remix ] (img2img), and it becomes a real photo again and that in turn you can then turn into a video again with [ Make video ] So that essentially gives you a fully freeform camera position control to take photos with One thing I need to fix is the background/skybox, I kinda need to take the original photo and remove the person and just get the background for the 3d model viewer, in this case it should be white, but it's a start!

@levelsio

119,210 просмотров • 1 год назад

Break-A-Scene: Extracting Multiple Concepts from a Single Image introduce the task of textual scene decomposition: given a single image of a scene that may contain several concepts, we aim to extract a distinct text token for each concept, enabling fine-grained control over the generated scenes. To this end, we propose augmenting the input image with masks that indicate the presence of target concepts. These masks can be provided by the user or generated automatically by a pre-trained segmentation model. We then present a novel two-phase customization process that optimizes a set of dedicated textual embeddings (handles), as well as the model weights, striking a delicate balance between accurately capturing the concepts and avoiding overfitting. We employ a masked diffusion loss to enable handles to generate their assigned concepts, complemented by a novel loss on cross-attention maps to prevent entanglement. We also introduce union-sampling, a training strategy aimed to improve the ability of combining multiple concepts in generated images. We use several automatic metrics to quantitatively compare our method against several baselines, and further affirm the results using a user study. Finally, we showcase several applications of our method paper page:

Break-A-Scene: Extracting Multiple Concepts from a Single Image introduce the task of textual scene decomposition: given a single image of a scene that may contain several concepts, we aim to extract a distinct text token for each concept, enabling fine-grained control over the generated scenes. To this end, we propose augmenting the input image with masks that indicate the presence of target concepts. These masks can be provided by the user or generated automatically by a pre-trained segmentation model. We then present a novel two-phase customization process that optimizes a set of dedicated textual embeddings (handles), as well as the model weights, striking a delicate balance between accurately capturing the concepts and avoiding overfitting. We employ a masked diffusion loss to enable handles to generate their assigned concepts, complemented by a novel loss on cross-attention maps to prevent entanglement. We also introduce union-sampling, a training strategy aimed to improve the ability of combining multiple concepts in generated images. We use several automatic metrics to quantitatively compare our method against several baselines, and further affirm the results using a user study. Finally, we showcase several applications of our method paper page:

AK

154,511 просмотров • 3 лет назад

3D Gaussian Splatting for Real-Time Radiance Field Rendering paper page: Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.

3D Gaussian Splatting for Real-Time Radiance Field Rendering paper page: Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.

AK

633,532 просмотров • 3 лет назад

Please share and help! 🌊🇺🇦 🙏 9 dead in Odesa due to severe weather, including a family with a child. We are trying to gather world experts to solve this problem. ‼️ Please recommend which municipality from your city/country we could collaborate with, and write the details in the comments. 🗣️ NGO "Spilna Meta" has been drawing attention to the emergency condition of the drain collector and traverse No. 17 for several years. • the collector is in an emergency condition; • the protective structure (casing) is not accounted for and is not actually maintained; • funds for repairs have not been allocated for years. ➡️ Along Magnitohorska Street, the natural channel, which always carried rainwater towards "Zolotyi Bereh", was gradually built up with various economic facilities. This also significantly worsened the situation and contributed to the sudden flooding. 🤝🇺🇦 We need help in improving Odesa's infrastructure. We would be grateful if you could also write in a private message or email Ua.savchenko@gmail.com

Please share and help! 🌊🇺🇦 🙏 9 dead in Odesa due to severe weather, including a family with a child. We are trying to gather world experts to solve this problem. ‼️ Please recommend which municipality from your city/country we could collaborate with, and write the details in the comments. 🗣️ NGO "Spilna Meta" has been drawing attention to the emergency condition of the drain collector and traverse No. 17 for several years. • the collector is in an emergency condition; • the protective structure (casing) is not accounted for and is not actually maintained; • funds for repairs have not been allocated for years. ➡️ Along Magnitohorska Street, the natural channel, which always carried rainwater towards "Zolotyi Bereh", was gradually built up with various economic facilities. This also significantly worsened the situation and contributed to the sudden flooding. 🤝🇺🇦 We need help in improving Odesa's infrastructure. We would be grateful if you could also write in a private message or email [email protected]

Savchenko Volodymyr

14,229 просмотров • 9 месяцев назад

Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians paper page: Creating high-fidelity 3D head avatars has always been a research hotspot, but there remains a great challenge under lightweight sparse view setups. In this paper, we propose Gaussian Head Avatar represented by controllable 3D Gaussians for high-fidelity head avatar modeling. We optimize the neutral 3D Gaussians and a fully learned MLP-based deformation field to capture complex expressions. The two parts benefit each other, thereby our method can model fine-grained dynamic details while ensuring expression accuracy. Furthermore, we devise a well-designed geometry-guided initialization strategy based on implicit SDF and Deep Marching Tetrahedra for the stability and convergence of the training procedure. Experiments show our approach outperforms other state-of-the-art sparse-view methods, achieving ultra high-fidelity rendering quality at 2K resolution even under exaggerated expressions.

AK

65,847 просмотров • 2 лет назад

Dear François Nzanga Mobutu, Good evening. I bet your father is turning in his grave, writhing in pain and shame after reading your tweet. He, at least, knew that there are authentic and fully-fledged Congolese Tutsis. He proved it until people like you and others came along and misled him. Do I need to inform you, in case you're unaware, that the people who demonstrated yesterday in Washington DC, roughly 4,000 of them, others the same day in Nairobi, and still others in England shortly before, are Banyamulenge, these Congolese Tutsis from the highlands, who are protesting against what Tshisekedi and the FARDC, the Wazalendo, Évariste Ndayishimiye and the FDNB, as well as the FDLR and mercenaries, are doing to their relatives in Minembwe and throughout the highlands? Their Sukhoi fighter jets and drones bomb daily, killing children, women, the elderly, and men. They destroy homes, churches, schools, hospitals, and community radio stations. They even kill cows and sheep and destroy fields and crops. They have imposed a blockade on Minembwe, with no entry or exit. They have done and are doing the same thing to the Tutsi in Masisi and Rutshuru in North Kivu. Instead of listening to their cries and examining their demands, the excuse is quickly found; the preferred shortcut is Rwanda. Always and only Rwanda. Too easy, isn't it? Your thinking, which is also the regime's narrative, can be summarized in these two sentences: - all the problems facing the DRC come from elsewhere, particularly from Rwanda; we will end up being told that even the migrants are here because of Rwanda. “And all the solutions must come from elsewhere, especially from the United States of America, from Papa Trump.” Under these conditions, what is the point of the regime you serve? Two small truths to remember, dear François: “Such an attitude, this kind of ideology, this denial of nationality to Congolese Tutsis of origin, from North and South Kivu, as well as to all those who are victims of the same persecution, like the Hema and others, this easy rejection based on appearance (racial profiling), are among the root causes of the crisis the country is going through;” “As long as we haven’t decided to be sufficiently responsible, to sit down as a nation and rigorously assess our share (of responsibility) in what is happening to us, we will continue to wait for solutions from others, solutions that may never come.” Furthermore, there are satanic verses that must be banished immediately, in the interest of everyone and the country, such as: "There are no Congolese Tutsis, every Tutsi is Rwandan, therefore a foreigner," etc. Either we will be able to put an end to exclusion, discrimination, hate speech, and ethnic hatred, and live together according to the law and history, or this deep-seated problem risks haunting us for a long time. But to achieve this, we need leadership capable of understanding and transcending differences and turning them into assets for living together. This is possible.

Dear François Nzanga Mobutu, Good evening. I bet your father is turning in his grave, writhing in pain and shame after reading your tweet. He, at least, knew that there are authentic and fully-fledged Congolese Tutsis. He proved it until people like you and others came along and misled him. Do I need to inform you, in case you're unaware, that the people who demonstrated yesterday in Washington DC, roughly 4,000 of them, others the same day in Nairobi, and still others in England shortly before, are Banyamulenge, these Congolese Tutsis from the highlands, who are protesting against what Tshisekedi and the FARDC, the Wazalendo, Évariste Ndayishimiye and the FDNB, as well as the FDLR and mercenaries, are doing to their relatives in Minembwe and throughout the highlands? Their Sukhoi fighter jets and drones bomb daily, killing children, women, the elderly, and men. They destroy homes, churches, schools, hospitals, and community radio stations. They even kill cows and sheep and destroy fields and crops. They have imposed a blockade on Minembwe, with no entry or exit. They have done and are doing the same thing to the Tutsi in Masisi and Rutshuru in North Kivu. Instead of listening to their cries and examining their demands, the excuse is quickly found; the preferred shortcut is Rwanda. Always and only Rwanda. Too easy, isn't it? Your thinking, which is also the regime's narrative, can be summarized in these two sentences: - all the problems facing the DRC come from elsewhere, particularly from Rwanda; we will end up being told that even the migrants are here because of Rwanda. “And all the solutions must come from elsewhere, especially from the United States of America, from Papa Trump.” Under these conditions, what is the point of the regime you serve? Two small truths to remember, dear François: “Such an attitude, this kind of ideology, this denial of nationality to Congolese Tutsis of origin, from North and South Kivu, as well as to all those who are victims of the same persecution, like the Hema and others, this easy rejection based on appearance (racial profiling), are among the root causes of the crisis the country is going through;” “As long as we haven’t decided to be sufficiently responsible, to sit down as a nation and rigorously assess our share (of responsibility) in what is happening to us, we will continue to wait for solutions from others, solutions that may never come.” Furthermore, there are satanic verses that must be banished immediately, in the interest of everyone and the country, such as: "There are no Congolese Tutsis, every Tutsi is Rwandan, therefore a foreigner," etc. Either we will be able to put an end to exclusion, discrimination, hate speech, and ethnic hatred, and live together according to the law and history, or this deep-seated problem risks haunting us for a long time. But to achieve this, we need leadership capable of understanding and transcending differences and turning them into assets for living together. This is possible.

Me Moise Nyarugabo

22,470 просмотров • 3 месяцев назад

I had a meeting with French President Emmanuel Macron on the sidelines of the European Council meeting today in Brussels. I thanked Emmanuel for his clear and principled stance in support of Ukraine and the need for new, more substantial steps to protect our entire Europe—our peoples, our institutions, and our European way of life. We discussed the upcoming meeting on March 11 at the level of military representatives of the countries willing to put in greater efforts to ensure reliable security in the context of ending this war. We coordinated our positions and next steps. We have an absolutely clear shared vision that real and lasting peace is possible through cooperation between Ukraine, all of Europe, and the United States. The war must end as soon as possible.

I had a meeting with French President Emmanuel Macron on the sidelines of the European Council meeting today in Brussels. I thanked Emmanuel for his clear and principled stance in support of Ukraine and the need for new, more substantial steps to protect our entire Europe—our peoples, our institutions, and our European way of life. We discussed the upcoming meeting on March 11 at the level of military representatives of the countries willing to put in greater efforts to ensure reliable security in the context of ending this war. We coordinated our positions and next steps. We have an absolutely clear shared vision that real and lasting peace is possible through cooperation between Ukraine, all of Europe, and the United States. The war must end as soon as possible.

Volodymyr Zelenskyy / Володимир Зеленський

904,415 просмотров • 1 год назад

The LACMA has long been a great supporter of initiatives in art and technology. From the original Art and Technology Lab, to more recent exhibitions like Coded: Art Enters the Computer Age, 1952–1982, their sustained interest in the development of creative and artistic practices in relation to the rise of computer technology has helped sustain a once controversial yet undeniably fascinating genre of art before it came into public discourse. I am so pleased with the opportunity to highlight an important part of my practice, known as parameterization. By changing one variable in the program, in this case the percentage of pegs to be wrapped by a string, we are able to see how the original artwork, Ringers #962, evolves over the course of 9 frames arranged in a grid. This process is what I call a “parameter sweep” and it’s an honor to present the combined works as the LACMA Iterations. As the artist who coded the algorithm, I had no explicit control over how the artwork would develop, instead relying on randomization and a deep familiarity of the aesthetic form to help guide the output through code.

The LACMA has long been a great supporter of initiatives in art and technology. From the original Art and Technology Lab, to more recent exhibitions like Coded: Art Enters the Computer Age, 1952–1982, their sustained interest in the development of creative and artistic practices in relation to the rise of computer technology has helped sustain a once controversial yet undeniably fascinating genre of art before it came into public discourse. I am so pleased with the opportunity to highlight an important part of my practice, known as parameterization. By changing one variable in the program, in this case the percentage of pegs to be wrapped by a string, we are able to see how the original artwork, Ringers #962, evolves over the course of 9 frames arranged in a grid. This process is what I call a “parameter sweep” and it’s an honor to present the combined works as the LACMA Iterations. As the artist who coded the algorithm, I had no explicit control over how the artwork would develop, instead relying on randomization and a deep familiarity of the aesthetic form to help guide the output through code.

Dmitri Cherniak

19,298 просмотров • 2 лет назад

In the so-called oldest “monotheistic” religion, the Ark is essentially a doomsday bunker, and the flood results from global cooling followed by melting, which closely resembles the Younger Dryas and the 8.2-kiloyear events. Ahura Mazda warns Yima that a devastating event is coming in the form of an evil winter sent by Angra Mainyu (Ahriman). There will be deadly cold, deep snow, and flooding after the thaw, capable of destroying humans, animals, and plants. Yima is instructed to build a Vara—a sealed, protected enclosure—to preserve the best humans, the best animals, and the seeds of plants. We know that Zoroastrian religion is based on cycles.

In the so-called oldest “monotheistic” religion, the Ark is essentially a doomsday bunker, and the flood results from global cooling followed by melting, which closely resembles the Younger Dryas and the 8.2-kiloyear events. Ahura Mazda warns Yima that a devastating event is coming in the form of an evil winter sent by Angra Mainyu (Ahriman). There will be deadly cold, deep snow, and flooding after the thaw, capable of destroying humans, animals, and plants. Yima is instructed to build a Vara—a sealed, protected enclosure—to preserve the best humans, the best animals, and the seeds of plants. We know that Zoroastrian religion is based on cycles.

Open Minded Approach

48,477 просмотров • 7 месяцев назад