vision🍌 is here if you got into computer vision the way I did, starting with pixel-level labeling tasks like segmentation, edges, depth, or surface normals, you’ll probably feel the same seeing these results -- something big has quietly shifted, and it’s going to change how we approach these problems for good 🧵
65,701 просмотров
well someone has been preaching this at us for like 6+ years glad we are past the 'feel the agi' phase and back to building toward human-level intelligence
121,083 просмотров
Video understanding is the next frontier, but not all videos are alike. Models now reason over youtube clips and feature films, but what about the everyday spaces we—and our future AI assistants—navigate and experience? Introducing Thinking in Space, our latest study exploring how multimodal LLMs see, remember and recall spaces. 🧵[1/n]