正在加载视频...

视频加载失败

Excited to share our work on Neural Assets: a new method for enabling 3D asset-level control in image diffusion models – scalable & without any 3D inductive biases. Neural Assets goes beyond text or pixel-based control & provides an interface inspired by 3D graphics tools. 🧵

97,924 次观看 • 2 年前 •via X (Twitter)

9 条评论

Thomas Kipf 的头像
Thomas Kipf2 年前

Paper: Website: Neural Assets enables a range of 3D editing capabilities for individual or multiple assets: translation, rotation, rescaling, transfer across scenes, and control over the scene background.

Thomas Kipf 的头像
Thomas Kipf2 年前

Assets extracted from one scene and placed into a different scene or background naturally adapt to lighting conditions and other environmental factors. At night or in rainy conditions, cars even turn their lights/headlights on!

Thomas Kipf 的头像
Thomas Kipf2 年前

Neural Assets are extracted from raw video frames with the help of 2D or 3D boxes. The key to make it work is to extract appearance and pose representations from *different* frames, which results in disentanglement and thus controllability. The entire model is trained/fine-tuned end-to-end jointly with a pre-trained image generation model (here: Stable Diffusion 2.1) simply by replacing the text token sequence in the base text-to-image model with a Neural Assets token sequence. At test time, we can compose scenes by combining multiple Neural Assets and a neural representation of the scene background, while ensuring appearance consistency (to a large extent) both for individual assets as well as the overall scene.

Thomas Kipf 的头像
Thomas Kipf2 年前

Check out our paper ( and our website ( for a lot more details, results, and current limitations / failure modes. Neural Assets is the result of @Dazitu_616's outstanding work as a student researcher in our team, working with a set of fantastic collaborators: @YuliaRubanova, @RishabhKabra, @drewAhudson, @yusufaytar, @vansteenkiste_s, @KelseyRAllen; advised by @igilitschenski. Starting with Slot Attention in 2020, we have pursued this research direction over the past four years (SAVi, OSRT, DORSal & many other works). I couldn't be more excited about this latest result and the potential for this class of methods to enable new creative control capabilities for image generation models and beyond.

Yulia Rubanova 的头像
Yulia Rubanova2 年前

Super excited to be part of this work. Using the same interface of Neural Assets, we can get a rich set of controls over the objects and seamlessly blend the objects into the environment, with appropriate lighting and shadows. Amazing work, @Dazitu_616!

Ziyi Wu 的头像
Ziyi Wu2 年前

Thank you, Thomas! It's been an awesome experience working with you and all the Google folks. I will definitely miss this Student Researcher journey!

Omri Kaduri 的头像
Omri Kaduri1 年前

That's really great to see the progress you are making on object-centric representations. Does this model and code will be released?

Nate Codes 的头像
Nate Codes1 年前

When people can do this inside of the physical neural asset space things I'm going to be lost in AR/VR! I love your work and it has tons of implications for my work.

Thomas Kipf 的头像
Thomas Kipf1 年前

Thanks, Nate! Lots of work still to be done by the machine learning community before an approach like this becomes widely usable, but I’m personally really excited about this future.

相关视频