Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Excited to share our work on Neural Assets: a new method for enabling 3D asset-level control in image diffusion models – scalable & without any 3D inductive biases. Neural Assets goes beyond text or pixel-based control & provides an interface inspired by 3D graphics tools. 🧵

Thomas Kipf

29,089 subscribers

97,924 Aufrufe • vor 2 Jahren •via X (Twitter)

Bildung Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

9 Kommentare

Profilbild von Thomas Kipf

Thomas Kipfvor 2 Jahren

Paper: Website: Neural Assets enables a range of 3D editing capabilities for individual or multiple assets: translation, rotation, rescaling, transfer across scenes, and control over the scene background.

Profilbild von Thomas Kipf

Thomas Kipfvor 2 Jahren

Assets extracted from one scene and placed into a different scene or background naturally adapt to lighting conditions and other environmental factors. At night or in rainy conditions, cars even turn their lights/headlights on!

Profilbild von Thomas Kipf

Thomas Kipfvor 2 Jahren

Neural Assets are extracted from raw video frames with the help of 2D or 3D boxes. The key to make it work is to extract appearance and pose representations from *different* frames, which results in disentanglement and thus controllability. The entire model is trained/fine-tuned end-to-end jointly with a pre-trained image generation model (here: Stable Diffusion 2.1) simply by replacing the text token sequence in the base text-to-image model with a Neural Assets token sequence. At test time, we can compose scenes by combining multiple Neural Assets and a neural representation of the scene background, while ensuring appearance consistency (to a large extent) both for individual assets as well as the overall scene.

Profilbild von Thomas Kipf

Thomas Kipfvor 2 Jahren

Check out our paper ( and our website ( for a lot more details, results, and current limitations / failure modes. Neural Assets is the result of @Dazitu_616's outstanding work as a student researcher in our team, working with a set of fantastic collaborators: @YuliaRubanova, @RishabhKabra, @drewAhudson, @yusufaytar, @vansteenkiste_s, @KelseyRAllen; advised by @igilitschenski. Starting with Slot Attention in 2020, we have pursued this research direction over the past four years (SAVi, OSRT, DORSal & many other works). I couldn't be more excited about this latest result and the potential for this class of methods to enable new creative control capabilities for image generation models and beyond.

Profilbild von Yulia Rubanova

Yulia Rubanovavor 2 Jahren

Super excited to be part of this work. Using the same interface of Neural Assets, we can get a rich set of controls over the objects and seamlessly blend the objects into the environment, with appropriate lighting and shadows. Amazing work, @Dazitu_616!

Profilbild von Ziyi Wu

Ziyi Wuvor 2 Jahren

Thank you, Thomas! It's been an awesome experience working with you and all the Google folks. I will definitely miss this Student Researcher journey!

Profilbild von Omri Kaduri

Omri Kadurivor 1 Jahr

That's really great to see the progress you are making on object-centric representations. Does this model and code will be released?

Profilbild von Nate Codes

Nate Codesvor 1 Jahr

When people can do this inside of the physical neural asset space things I'm going to be lost in AR/VR! I love your work and it has tons of implications for my work.

Profilbild von Thomas Kipf

Thomas Kipfvor 1 Jahr

Thanks, Nate! Lots of work still to be done by the machine learning community before an approach like this becomes widely usable, but I’m personally really excited about this future.

Ähnliche Videos

🎨 Explore Neural’s 3D Assets on ArtStation Discover a collection of stunning 3D models including alien creatures, a glowing violet owl, futuristic robots, and interdimensional portals. All were created using $NEURAL text-to-image and transformed into detailed 3D assets. This is just a glimpse of what’s possible when AI meets imagination. 📺 Watch the video and explore our latest 3D creations. ➡️Visit our ArtStation Profile:

NeuralAI

25,791 Aufrufe • vor 1 Jahr

Making 3D assets just got stupidly easy. Tripo turns a sentence or a single image into a full 3d model, ready for blender, games, printing, whatever. Idea → image → 3D asset. Full workflow below 🧵

Making 3D assets just got stupidly easy. Tripo turns a sentence or a single image into a full 3d model, ready for blender, games, printing, whatever. Idea → image → 3D asset. Full workflow below 🧵

Farhan

44,468 Aufrufe • vor 23 Tagen

3DTopia-XL High-Quality 3D PBR Asset Generation via Primitive Diffusion demo: model: 3DTopia-XL scales high-quality 3D asset generation using Diffusion Transformer (DiT) built upon an expressive and efficient 3D representation, PrimX. The denoising process takes 5 seconds to generate a 3D PBR asset from text/image input which is ready for the graphics pipeline to use.

3DTopia-XL High-Quality 3D PBR Asset Generation via Primitive Diffusion demo: model: 3DTopia-XL scales high-quality 3D asset generation using Diffusion Transformer (DiT) built upon an expressive and efficient 3D representation, PrimX. The denoising process takes 5 seconds to generate a 3D PBR asset from text/image input which is ready for the graphics pipeline to use.

AK

87,086 Aufrufe • vor 1 Jahr

CSM on Google Cloud is a game-changer (pun intended). Turn any prompt (text, image, sketch) into game-engine-ready 3D assets in minutes. Rapidly iterate on ideas and streamline 3D asset production with Common Sense Machines tools to bring your games to life quicker, better, and for less ↓

CSM on Google Cloud is a game-changer (pun intended). Turn any prompt (text, image, sketch) into game-engine-ready 3D assets in minutes. Rapidly iterate on ideas and streamline 3D asset production with Common Sense Machines tools to bring your games to life quicker, better, and for less ↓

Google Cloud

28,439 Aufrufe • vor 1 Jahr

🌟Your static 3D world models are now alive and interactable! 🚀Introducing NeuROK, a neural simulation framework that turns any static 3D object into an interactive 4D asset — no per-category physics, no physical annotations for training. 📄 🧵 1/n

🌟Your static 3D world models are now alive and interactable! 🚀Introducing NeuROK, a neural simulation framework that turns any static 3D object into an interactive 4D asset — no per-category physics, no physical annotations for training. 📄 🧵 1/n

Chen Geng

32,101 Aufrufe • vor 1 Monat

(1/2) Excited to share "Learning Neural Parametric Head Models" #CVPR2023! We capture over 5200 high-quality 3D human head scans from which we build a neural parametric head model that disentangles & expressions and deformations.

(1/2) Excited to share "Learning Neural Parametric Head Models" #CVPR2023! We capture over 5200 high-quality 3D human head scans from which we build a neural parametric head model that disentangles & expressions and deformations.

Matthias Niessner

53,279 Aufrufe • vor 3 Jahren

📢Introducing 360Anything, our method for lifting any perspective image or video to gravity-aligned 360° panoramas without using any camera or 3D information. This enables consistent novel view synthesis and 3D scene reconstruction. Project page: 🧵

📢Introducing 360Anything, our method for lifting any perspective image or video to gravity-aligned 360° panoramas without using any camera or 3D information. This enables consistent novel view synthesis and 3D scene reconstruction. Project page: 🧵

Ziyi Wu

62,310 Aufrufe • vor 5 Monaten

📢WorldAgents: 3D worlds only from 2D image models - without any training! We propose an agentic approach with a Director (VLM) to plan the scene, a Generator (Flux or NanoBanana) for new views, and a Verifier (VLM) for selection / 3D consistency. -> High-fidelity 3D worlds from a single text prompt. What's remarkable: our agents find consistent views from 2D image models to obtain 3D-consistent worlds; this shows that image models contain world priors - agents just need to find them! Great work by Ziya Erkoç Angela Dai

📢WorldAgents: 3D worlds only from 2D image models - without any training! We propose an agentic approach with a Director (VLM) to plan the scene, a Generator (Flux or NanoBanana) for new views, and a Verifier (VLM) for selection / 3D consistency. -> High-fidelity 3D worlds from a single text prompt. What's remarkable: our agents find consistent views from 2D image models to obtain 3D-consistent worlds; this shows that image models contain world priors - agents just need to find them! Great work by Ziya Erkoç Angela Dai

Matthias Niessner

18,923 Aufrufe • vor 3 Monaten

Nvidia presents Edify 3D! The method can generate high-quality 3D assets from text descriptions. It uses a diffusion model to create detailed quad-mesh topologies and high-resolution textures in under 2 minutes.

Nvidia presents Edify 3D! The method can generate high-quality 3D assets from text descriptions. It uses a diffusion model to create detailed quad-mesh topologies and high-resolution textures in under 2 minutes.

Dreaming Tulpa 🥓👑

39,517 Aufrufe • vor 1 Jahr

wow.. text to 3D would AI is here you can generate 3D assets using text/image and build up a 3D world.. and even auto rig characters. link in comment

wow.. text to 3D would AI is here you can generate 3D assets using text/image and build up a 3D world.. and even auto rig characters. link in comment

el.cine

212,681 Aufrufe • vor 1 Jahr

PhysX-Anything: creates simulation-ready, articulated 3D assets from a single image; VLM-based model using a new 3D representation that reduces token count by 193x

PhysX-Anything: creates simulation-ready, articulated 3D assets from a single image; VLM-based model using a new 3D representation that reduces token count by 193x

Wildminder

49,884 Aufrufe • vor 7 Monaten

DiffSplat Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation DiffSplat is a generative framework to synthesize 3D Gaussian Splats from text prompts & single-view images in ⚡️ 1~2 seconds. It is fine-tuned directly from a pretrained text-to-image diffusion model.

DiffSplat Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation DiffSplat is a generative framework to synthesize 3D Gaussian Splats from text prompts & single-view images in ⚡️ 1~2 seconds. It is fine-tuned directly from a pretrained text-to-image diffusion model.

AK

38,416 Aufrufe • vor 1 Jahr

Exploring 3D Generation Capabilities in Alchemist AI🔮 Alchemist AI now enables users to create basic 3D assets, including models, objects, environments, and simulations, expanding its capabilities beyond 2D asset generation. What Does This Mean for You? • 3D Environment Creation: Generate virtual worlds, such as game environments or fantasy landscapes. Define parameters like terrain or structures to create dynamic and adaptable spaces tailored to your needs. • Customizable 3D Models and Objects: Design and modify 3D assets by adjusting dimensions, materials, and textures. Whether crafting prototypes or characters, users maintain full creative control. • Interactive Simulations: Build physics-based simulations or animated scenes. With upcoming support for sprite libraries and animation rigs, fine-tune object behaviors and interactions to suit your projects. With future API updates, Alchemist AI’s 3D generation capabilities will further expand, enhancing tools for creating models, environments, and simulations. AI-assisted text-to-3D will also be introduced—just describe your vision, such as 'space station' or 'orange sports car' and the system will generate customizable base assets.

Exploring 3D Generation Capabilities in Alchemist AI🔮 Alchemist AI now enables users to create basic 3D assets, including models, objects, environments, and simulations, expanding its capabilities beyond 2D asset generation. What Does This Mean for You? • 3D Environment Creation: Generate virtual worlds, such as game environments or fantasy landscapes. Define parameters like terrain or structures to create dynamic and adaptable spaces tailored to your needs. • Customizable 3D Models and Objects: Design and modify 3D assets by adjusting dimensions, materials, and textures. Whether crafting prototypes or characters, users maintain full creative control. • Interactive Simulations: Build physics-based simulations or animated scenes. With upcoming support for sprite libraries and animation rigs, fine-tune object behaviors and interactions to suit your projects. With future API updates, Alchemist AI’s 3D generation capabilities will further expand, enhancing tools for creating models, environments, and simulations. AI-assisted text-to-3D will also be introduced—just describe your vision, such as 'space station' or 'orange sports car' and the system will generate customizable base assets.

ALCHEMIST AI 🔮

35,641 Aufrufe • vor 1 Jahr

Introducing Neural Jacobian Fields, robot 3D kinematic models learned only from vision! They can model & control robots from just a single RGB camera, even those w/ intractable kinematics & no embedded sensors such as soft, 3D-printed pneumatic hands! 1/n

Introducing Neural Jacobian Fields, robot 3D kinematic models learned only from vision! They can model & control robots from just a single RGB camera, even those w/ intractable kinematics & no embedded sensors such as soft, 3D-printed pneumatic hands! 1/n

Vincent Sitzmann

54,204 Aufrufe • vor 1 Jahr

🚨 Pixal3D is now live on fal! 🧊 High-fidelity 3D assets from a single image 🎯 Pixel-aligned back-projection for direct pixel-to-3D correspondence 🎨 Detailed geometry with PBR textures, ready for downstream pipelines

🚨 Pixal3D is now live on fal! 🧊 High-fidelity 3D assets from a single image 🎯 Pixel-aligned back-projection for direct pixel-to-3D correspondence 🎨 Detailed geometry with PBR textures, ready for downstream pipelines

fal

14,192 Aufrufe • vor 1 Monat

An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion discuss: We introduce a new approach for generating realistic 3D models with UV maps through a representation termed "Object Images." This approach encapsulates surface geometry, appearance, and patch structures within a 64x64 pixel image, effectively converting complex 3D shapes into a more manageable 2D format. By doing so, we address the challenges of both geometric and semantic irregularity inherent in polygonal meshes. This method allows us to use image generation models, such as Diffusion Transformers, directly for 3D shape generation. Evaluated on the ABO dataset, our generated shapes with patch structures achieve point cloud FID comparable to recent 3D generative models, while naturally supporting PBR material generation.

An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion discuss: We introduce a new approach for generating realistic 3D models with UV maps through a representation termed "Object Images." This approach encapsulates surface geometry, appearance, and patch structures within a 64x64 pixel image, effectively converting complex 3D shapes into a more manageable 2D format. By doing so, we address the challenges of both geometric and semantic irregularity inherent in polygonal meshes. This method allows us to use image generation models, such as Diffusion Transformers, directly for 3D shape generation. Evaluated on the ABO dataset, our generated shapes with patch structures achieve point cloud FID comparable to recent 3D generative models, while naturally supporting PBR material generation.

AK

66,435 Aufrufe • vor 1 Jahr

Text-to-4D Worlds in Production Environments 🛠️ We are thrilled to release open source agentic tools that stitch together: 🧊 CSM AI for 3D assets ✨ Free & instant text-to-animation with auto-rigging 🧩 Blender MCP add-on 💬 Your favorite client — Cursor, Claude, Windsurf, or custom scripts Built for 3D artists and devs who want speed and control without leaving their favorite toolchain.

Text-to-4D Worlds in Production Environments 🛠️ We are thrilled to release open source agentic tools that stitch together: 🧊 CSM AI for 3D assets ✨ Free & instant text-to-animation with auto-rigging 🧩 Blender MCP add-on 💬 Your favorite client — Cursor, Claude, Windsurf, or custom scripts Built for 3D artists and devs who want speed and control without leaving their favorite toolchain.

Common Sense Machines

13,993 Aufrufe • vor 1 Jahr

🔥Text-to-3D Foundation Model🔥 We are excited to announce #3DTopia, a generalist 🧊text-to-3D🧊 foundation model, which produces ** high-quality 3D assets within 5 minutes ** - Code: - Video:

🔥Text-to-3D Foundation Model🔥 We are excited to announce #3DTopia, a generalist 🧊text-to-3D🧊 foundation model, which produces high-quality 3D assets within 5 minutes - Code: - Video:

Ziwei Liu

62,424 Aufrufe • vor 2 Jahren

Introducing Wonder 3D, a new generative AI model in Flow Studio for fast and detailed 3D asset creation. 💬 Text to 3D — Watch your words take shape. Turn a simple prompt into fully textured 3D characters and props. 📸 Image to 3D — Add depth to your visuals. Transform concept art or reference images into dimensional 3D models. 🖼️ Text to Image — Visualize ideas early. Turn descriptions into concept imagery before converting them into 3D. Built for endless storytelling. Give your imagination some dimension. Available in all tiers. Find it under Wonder Tools. 👉 Get started: 💡 Learn more:

Introducing Wonder 3D, a new generative AI model in Flow Studio for fast and detailed 3D asset creation. 💬 Text to 3D — Watch your words take shape. Turn a simple prompt into fully textured 3D characters and props. 📸 Image to 3D — Add depth to your visuals. Transform concept art or reference images into dimensional 3D models. 🖼️ Text to Image — Visualize ideas early. Turn descriptions into concept imagery before converting them into 3D. Built for endless storytelling. Give your imagination some dimension. Available in all tiers. Find it under Wonder Tools. 👉 Get started: 💡 Learn more:

Autodesk Flow Studio

107,724 Aufrufe • vor 4 Monaten