Video wird geladen...
Video konnte nicht geladen werden
..jesus open interpreter's first vision model, piloting my 8gb M1 macbook. 100% offline. this will be inside every computer in the world.
372,183 Aufrufe • vor 2 Jahren •via X (Twitter)
10 Kommentare

are there any plans for hierarchical vision models? i.e. if there are multiple play buttons on the screen, and i say "click play on youtube", it knows to first isolate youtube window for the next inference on the vision model?

actually yes, this exactly! soon, OI will be focused on the active window + let the LLM programmatically switch active windows. have been experimenting with the right way to do this. dramatically improves + speeds up vision model inference too, way less pixels.

This is so cool! Is this something like the Large Action Model proposed by @rabbit_hmi, where in a future state you could have it interact in the background with any app?

@rabbit_hmi thanks guli! yes, exactly what we're building—the 01 is an open-source Rabbit R1, this model is the fruit of that project.

Nice, y'all got icons working

yes! also you are a legend josh. would love to play with this model in the self-operating computer repo, building on the incredible advances you've made there. next OI should expose the model pretty cleanly, something like interpreter.point(screenshot_base64, query) -> coords

This is brilliant. I can see a integration testing framework built on top of this

yes!! great stuff in a similar vein at powered by OI

You're the best killian

thanks happy!
