Загрузка видео...
Не удалось загрузить видео
Accurate and controllable scene generation has been difficult with natural language alone. You instead need a language for scenes. Introducing the Scene Language — a visual representation for high-quality 3D/4D generation by integrating programs, words, and embeddings — 🧵(1/6)
63,813 просмотров • 1 год назад •via X (Twitter)
Комментарии: 9

The Scene Language uses programs, words, and neural embeddings to encode scene structures, semantics, and visual identities, respectively. It can be inferred using pre-trained LMs/VLMs to generate scenes from text and image prompts. (2/6)

The representation also applies to 4D scenes—many dynamic effects are simple to write in programs! Some text-to-4D synthesis results here: (3/6)

When given image prompts, the pipeline converts input images into 3D scenes while preserving the structure and content: (4/6)

The representation is not tied to one specific renderer; instead, it can be consumed by renderers ranging from end-to-end, neural generative models to traditional graphics engines. (5/6)

More results on project page: Paper: This is a very fun collaboration with the wonderful @zizhang_li, @Mattzh1314, @elliottszwu, and @jiajunwu_cs. (6/6)

I love how Scene Language combines the power of programs, words, and embeddings to create stunning 3D/4D visuals. Can't wait to explore its possibilities for urban planning and smart cities!

The results look crazy good! Can the semantic components be treated as separate 3D objects in something like Unity? Would love to get access to an API or something to integrate this into my tool.

Yes, semantic components are separate and can be individually imported as meshes. We'll release the code in November. Stay tuned! :)

That's why we have different courts for sailors.
