Video wird geladen...
Video konnte nicht geladen werden
SceneScript treats 3D reconstruction as a language problem rather than a geometry one. The model watches a video of a room and just learns to write a script for it. It autoregressively spits out text commands like make_wall(...) or make_bbox(...) that define the scene. Stanford's new "Scene Language" paper... show more
107,011 Aufrufe • vor 11 Monaten •via X (Twitter)
11 Kommentare

Semantic 3d scene understanding is absolutely crucial for robotics and spatial computing devices like AR and VR headsets.

Paper/project here. Need to fill out a form to get access to model weights:

Enjoyed this post? You might also enjoy my monthly newsletter:

I was working in construction when the iPhone 12 Pro came out and I used the LiDAR scanner for EVERYTHING, my boss thought it was sort of gimmicky at first but I could tell he liked it after a couple days of me finding apps that created detailed depth maps and showed inconsistencies in the dug paths where slate was to be laid down, this is almost exactly what I imagined the next evolution would be

No jumping, No Running. Workouts at home at any time.🕒🏠 BEST 15 min Beginner Home Workout for Weight Loss 🧘♀️🔥

this is really cool and obvious thing that they dont mention is how this could be used to also create simpler vocabulary for a scene you could define an object in the room give it a boundary box and a name like objectX and then say "task: carry objectX to table3" or event: table3 moved to cordination xy

Can it work from a Gaussian Splat scene?

fung shui module wen?

Very cool!

Really cool stuff. Can’t wait to see where this ends up going.

Special understanding goes so far. This is what will truly open up the full potential of AI. The possibilities will transgress new frontiers. This is an exciting part of that. Very cool!
