Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Google announces InseRF Text-Driven Generative Object Insertion in Neural 3D Scenes paper page: InseRF generates an object in a 3D scene via a text prompt and one 2D bounding box

AK

506,881 subscribers

205,987 views • 2 years ago •via X (Twitter)

News & Politics Science & Technology Education

Anya Rossi• Live Now

Private livecam show

10 Comments

Georgia Gkioxari2 years ago

The adding of the panettone got me. Panetonnes all year round please in NeRFs and the real world!

Mohamad Shahbazi2 years ago

Thanks a lot for featuring our work @_akhaliq! Here is the project page for more details and results:

DAS2 years ago

I bet this works as good as most google AI products we’ve seen demo(ed)

The AI Edge2 years ago

Existing methods for 3D scene editing are mostly effective for style and appearance changes or removing objects. But generating new objects is a challenge for them. InseRF addresses this by combining advances in NeRFs with advances in generative AI and also shows potential for future improvements in generative 2D and 3D models.

Vaibhav Tulsyan2 years ago

@adyaman

Cédric Limousin2 years ago

If real, that's the future of video. Starting from a blank scene, adding assets, then actors and animating them while being able to move the camera where you want.

Supreme2 years ago

what the

Poe Allen2 years ago

What is the big deal my bro

Nick Moran2 years ago

"Add a panettone on the tray" is an odd prompt if you're already drawing a bounding box over the tray. The figure in the paper makes it look like the actual prompt would just be "a panettone".

𝑫𝒂𝒏𝒊𝒆𝒍 𝑺𝒄𝒐𝒕𝒕 𝑴𝒂𝒕𝒕𝒉𝒆𝒘𝒔 🇦🇺2 years ago

Nice, but perhaps they need to use the existing scene to do a little bit of global illumination and environment mapping from the scene to the inserted object?

Related Videos

Object Cutter Create high-quality HD background removal for ANY object in your image with a text prompt or bounding boxes!

Object Cutter Create high-quality HD background removal for ANY object in your image with a text prompt or bounding boxes!

Gradio

135,968 views • 1 year ago

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

AK

62,768 views • 3 years ago

DreamBooth3D: Subject-Driven Text-to-3D Generation Personalized 3D models from just a few casual photos, with text-driven modifications abs: project page:

DreamBooth3D: Subject-Driven Text-to-3D Generation Personalized 3D models from just a few casual photos, with text-driven modifications abs: project page:

AK

107,503 views • 3 years ago

(1/4) Excited to share our #ICCV2023 paper Text2Room! We generate scene-scale textured 3D meshes from a given text prompt leveraging 2D text-to-image models such as StableDiffusion. Project: Code: Video:

(1/4) Excited to share our #ICCV2023 paper Text2Room! We generate scene-scale textured 3D meshes from a given text prompt leveraging 2D text-to-image models such as StableDiffusion. Project: Code: Video:

Matthias Niessner

74,893 views • 2 years ago

Current 3D generative models are slow and low quality. We present GRM, a large-scale model that reconstructs 3D Gaussians in 0.1s and generates high-quality 3D assets from text or single images in a few seconds. Demo: 1/4

Current 3D generative models are slow and low quality. We present GRM, a large-scale model that reconstructs 3D Gaussians in 0.1s and generates high-quality 3D assets from text or single images in a few seconds. Demo: 1/4

Gordon Wetzstein

19,189 views • 2 years ago

SceneScape: Text-Driven Consistent Scene Generation abs: project page: text-driven perpetual view generation -- synthesizing long videos of arbitrary scenes solely from an input text describing the scene and camera poses

SceneScape: Text-Driven Consistent Scene Generation abs: project page: text-driven perpetual view generation -- synthesizing long videos of arbitrary scenes solely from an input text describing the scene and camera poses

AK

73,258 views • 3 years ago

Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵

Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵

Ai2

85,773 views • 2 months ago

Excited to share HOI-PAGE, to appear at #ICML2026! 🚀 Lei Li generates 4D human-object interactions zero-shot from text A part-affordance graph grounds interactions via LLM+video priors, enabling complex multi-person, multi-object interactions 👉

Excited to share HOI-PAGE, to appear at #ICML2026! 🚀 Lei Li generates 4D human-object interactions zero-shot from text A part-affordance graph grounds interactions via LLM+video priors, enabling complex multi-person, multi-object interactions 👉

Angela Dai

10,456 views • 1 month ago

Talk2Move Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

Talk2Move Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

AK

10,659 views • 5 months ago

🔥 DreamEngine revolutionizes image generation with its text-guided object fusion capabilities! The demo and code for Text Guided Object Fustion are released! Let's unlock the Imaginations! Run it locally now in: Paper:

🔥 DreamEngine revolutionizes image generation with its text-guided object fusion capabilities! The demo and code for Text Guided Object Fustion are released! Let's unlock the Imaginations! Run it locally now in: Paper:

Liang Chen

11,357 views • 1 year ago

The relationship between vertices, sides, and edges in a 3D object

The relationship between vertices, sides, and edges in a 3D object

Interesting STEM

25,571 views • 4 months ago

Check out Christian Diller's CG-HOI :) We generate realistic 3D human-object interactions, from object geometry and text description. A key ingredient is explicit modeling of contact, during training and as guidance during inference.

Check out Christian Diller's CG-HOI :) We generate realistic 3D human-object interactions, from object geometry and text description. A key ingredient is explicit modeling of contact, during training and as guidance during inference.

Angela Dai

20,508 views • 2 years ago

2/ 3D world-building with NVIDIA NVIDIA used Edify AI to create a detailed 3D desert in minutes during a live demo Edify 3D generates editable 3D meshes from text or image prompts

2/ 3D world-building with NVIDIA NVIDIA used Edify AI to create a detailed 3D desert in minutes during a live demo Edify 3D generates editable 3D meshes from text or image prompts

Poonam Soni

107,018 views • 1 year ago

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: Code and Models: Paper:

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: Code and Models: Paper:

Sherwin Bahmani

66,490 views • 9 months ago

This is peak... Google just unveiled Genie 3 This AI generates photorealistic & 3D worlds from a text prompt and image... that you can explore in real-time Clearing a path towards AGI 10 wild examples + how to try below 1. Control a shiny marble

This is peak... Google just unveiled Genie 3 This AI generates photorealistic & 3D worlds from a text prompt and image... that you can explore in real-time Clearing a path towards AGI 10 wild examples + how to try below 1. Control a shiny marble

Linus ✦ Ekenstam

44,603 views • 4 months ago

⚡️Generating 3DGS scenes in 5 seconds on a single GPU⚡️ #FlashWorld enables ⚡️*fast*⚡️ (10~100x faster than previous methods) and 🔥*high-quality*🔥 3D world generation, from a single image or text prompt. Code: Page:

⚡️Generating 3DGS scenes in 5 seconds on a single GPU⚡️ #FlashWorld enables ⚡️fast⚡️ (10~100x faster than previous methods) and 🔥high-quality🔥 3D world generation, from a single image or text prompt. Code: Page:

Tengfei Wang

119,877 views • 8 months ago

"near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence [...] a Neural Object Field that is learned concurrently with a pose graph optimization process in order to robustly accumulate information into a consistent 3D representation"

"near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence [...] a Neural Object Field that is learned concurrently with a pose graph optimization process in order to robustly accumulate information into a consistent 3D representation"

Fabien Benetou

222,909 views • 3 years ago

An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion discuss: We introduce a new approach for generating realistic 3D models with UV maps through a representation termed "Object Images." This approach encapsulates surface geometry, appearance, and patch structures within a 64x64 pixel image, effectively converting complex 3D shapes into a more manageable 2D format. By doing so, we address the challenges of both geometric and semantic irregularity inherent in polygonal meshes. This method allows us to use image generation models, such as Diffusion Transformers, directly for 3D shape generation. Evaluated on the ABO dataset, our generated shapes with patch structures achieve point cloud FID comparable to recent 3D generative models, while naturally supporting PBR material generation.

An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion discuss: We introduce a new approach for generating realistic 3D models with UV maps through a representation termed "Object Images." This approach encapsulates surface geometry, appearance, and patch structures within a 64x64 pixel image, effectively converting complex 3D shapes into a more manageable 2D format. By doing so, we address the challenges of both geometric and semantic irregularity inherent in polygonal meshes. This method allows us to use image generation models, such as Diffusion Transformers, directly for 3D shape generation. Evaluated on the ABO dataset, our generated shapes with patch structures achieve point cloud FID comparable to recent 3D generative models, while naturally supporting PBR material generation.

AK

66,412 views • 1 year ago

wow.. text to 3D would AI is here you can generate 3D assets using text/image and build up a 3D world.. and even auto rig characters. link in comment

wow.. text to 3D would AI is here you can generate 3D assets using text/image and build up a 3D world.. and even auto rig characters. link in comment

el.cine

212,681 views • 1 year ago