Video wird geladen...
Video konnte nicht geladen werden
Spatial reasoning is a major challenge for the foundation models today, even in simple tasks like arranging objects in 3D space. #CVPR2025 Introducing LayoutVLM, a differentiable optimization framework that uses VLM to spatially reason about diverse scene layouts from unlabeled assets and open-ended language instructions 1/n
92,545 Aufrufe • vor 1 Jahr •via X (Twitter)
11 Kommentare

Due to the lack of 3D and dimensional awareness in LLMs, existing methods struggle to generate scenes that are 🔹physically plausible (i.e., no collision) 🔹semantically aligned (i.e., objects are placed meaningfully according to the language instruction) 2/n

Our key idea: Use a VLM to produce two complementary representations and enforce mutual consistency for better spatial reasoning. 🔹 Initialization: predict numerical poses from visually marked multi-view images 🔹 Optimization: generate spatial relations as differentiable objectives 3/n

The 3D layout optimization landscape is full of local minima—how can we escape them? 🔹 We refine the optimization objectives by validating them against the predicted numerical initialization (code is verifiable!). 🔹 We further finetune our VLM on human-designed 3D scene datasets (i.e., 3D-FRONT) 4/n

LayoutVLM outperforms existing methods in our benchmark, where models arrange up to *80* 3D assets given a language instruction and a floor plan. 5/n

Automated 3D layout generation unlocks richer simulation environments for robotics and embodied AI, enabling: 🔹 More realistic scenes and layouts during training 🔹 Improved generalization for real-world deployment Consider scene_synthesizer by @clembow, which shares a similar purpose 6/n

Beyond research, consider this: Rockstar Games spends $100M+ and countless human hours meticulously placing 3D assets to create immersive game worlds like GTA. When combined with asset generation models, a model that can spatially reason could automate content creation for gaming, VR/AR, film production, etc. 7/n

Huge thanks to the amazing team: @Weiyu_Liu_ (co-lead), Siyi Gu, @dill_pkl , Goutam Bhat, @fedassa , @ManlingLi_ , @nickhaber , @jiajunwu_cs 🌐Project site: 💻 Code (we plan to open-source everything): n/n

Many sites are too focused trying to chase the PHD level of SEO best practices when, in reality, they would get the most value from just getting their basics right. Internal links is a basic:

wonder if i can test out some Sprinter van interior modifications with this…

How does this help industries like construction using revit in a real world scenario?

How about arranging them in your mind just thinking & visualizing.

