Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

VGGT: Visual Geometry Grounded Transformer TL;DR: Is DUSt3R facing a formidable new rival? Contributions: (1) We introduce VGGT, a large feed-forward transformer that can, given one, a few, or even hundreds of images of a scene, predict all its key 3D attributes - including camera intrinsics and extrinsics, point...

29,461 görüntüleme • 1 yıl önce •via X (Twitter)

12 Yorum

MrNeRF profil fotoğrafı
MrNeRF1 yıl önce

Paper (pdf): Code:

MrNeRF profil fotoğrafı
MrNeRF1 yıl önce

Thanks for bringing this paper to my attention!

MrNeRF profil fotoğrafı
MrNeRF1 yıl önce

I'm crafting an email newsletter that turns my daily updates into a captivating weekly digest, complete with exclusive content. Although it's not live yet, you can sign up now! If you're curious, visit my website and join the subscriber list today!

MrNeRF profil fotoğrafı
MrNeRF1 yıl önce

Original author's post:

OPEN profil fotoğrafı
OPEN2 yıl önce

Introducing OPEN, the first genre-defining AAA metaverse gaming experience with top-tier IP powered by web3 technology. Coming to @thereadyverse. #opensoon

Pablo Vela profil fotoğrafı
Pablo Vela1 yıl önce

Gah looks so cool, still not MIT/Apache 😭😭

MrNeRF profil fotoğrafı
MrNeRF1 yıl önce

Yeah, but it is nice to see someone breaking into this monopoly which is good!

Abdullah Hamdi profil fotoğrafı
Abdullah Hamdi1 yıl önce

Our VGG group

Jianyuan Wang profil fotoğrafı
Jianyuan Wang1 yıl önce

Thanks for sharing! We released it in a silent mode for a while but was quickly caught lol

MrNeRF profil fotoğrafı
MrNeRF1 yıl önce

The silence is over :D. Awesome paper, thank you!

Sir Mr Meow Meow profil fotoğrafı
Sir Mr Meow Meow1 yıl önce

interesting

MrNeRF profil fotoğrafı
MrNeRF1 yıl önce

Yes, quite impressive!

Benzer Videolar

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

AK

249,572 görüntüleme • 2 yıl önce

Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives paper page: Given a set of calibrated images of a scene, we present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. While many approaches focus on recovering high-fidelity 3D scenes, we focus on parsing a scene into mid-level 3D representations made of a small set of textured primitives. Such representations are interpretable, easy to manipulate and suited for physics-based simulations. Moreover, unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images through differentiable rendering. Specifically, we model primitives as textured superquadric meshes and optimize their parameters from scratch with an image rendering loss. We highlight the importance of modeling transparency for each primitive, which is critical for optimization and also enables handling varying numbers of primitives. We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points, while providing amodal shape completions of unseen object regions. We compare our approach to the state of the art on diverse scenes from DTU, and demonstrate its robustness on real-life captures from BlendedMVS and Nerfstudio. We also showcase how our results can be used to effortlessly edit a scene or perform physical simulations.

AK

38,568 görüntüleme • 2 yıl önce

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior paper page: present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.

AK

161,530 görüntüleme • 2 yıl önce