Video wird geladen...
Video konnte nicht geladen werden
Imagine if language models could tap into the app ecosystem of your iPhone. Would the need for plugins and assistants become obsolete if we simply allowed a model to orchestrate our existing (and many years robust) user interfaces? This demonstrates the extent to which GPT-4V excels as a Generalist... show more
30,819 Aufrufe • vor 2 Jahren •via X (Twitter)
10 Kommentare

Over the last few months, I've been dabbling with using vision models not just in one area, but across web, desktop, and mobile platforms. It's become clear to me that there's a lot of untapped potential in these technologies. The closer we get them to our everyday gadgets, the better we can make use of what they have to offer. This shift could make our connection with AI feel more intuitive and seamless, moving away from a chatgpt-esque interaction with AI assistants.

Fibally got around to writing up my thoughts on UI-focused AI agents – it's not super deep, but it's filled with my takes and a bit of nerdy exploration. Slapped on my Medium hat for this one and dove right in.

Consider joining, will be looking into remote control next

I’ve been pondering on a similar idea. Being an android engineer I am working on using multi modal models to automate app. A world where we interact with voice (through glasses, pins, some kinda wearables) and use the phone only when we need to do some complex/UI task is not far.

I’ll be working in the next couple of days on a series of posts on the glue that made all of this possible and be publishing the latest on my GH – if you really are curious some of the latest are in the appium branch already!

Kudos to @Daniel1Paulus for the extensive iOS 17 work with go-ios

/cc @mreflow

/cc @karpathy

/cc @praeclarum

@PublicAI_ #AI
