Loading video...

Video Failed to Load

Go Home

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer...

249,572 views • 2 years ago •via X (Twitter)

7 Comments

Yining Hong's profile picture
Yining Hong2 years ago

Thanks for featuring our work!

DevHunterAI's profile picture
DevHunterAI2 years ago

Wow

AssistedEvolution's profile picture
AssistedEvolution2 years ago

Looks like nice work but surprising that folk have not been doing this already as transformer -> hippocample complex so this theoretically is exactly the way you might expect to train it. i.e. with spatio- temporal context.

JP's profile picture
JP2 years ago

Could this be leveraged to understand n dimensional spaces such as the weights and biases of a NN

Ori ~ᗜˬᗜ〜♡ — e/acc's profile picture
Ori ~ᗜˬᗜ〜♡ — e/acc2 years ago

🔥

Reverie's profile picture
Reverie2 years ago

I guess MAXAR Tech starts looking for this, More precision LLMs and VLMs for their 3D large-scale maps. Such a great work!

Ippi's profile picture
Ippi2 years ago

It's Skynet Alpha version noooooo

Related Videos

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior paper page: present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.

AK

161,530 views • 2 years ago