Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Microsoft released a groundbreaking model that can be used for web automation, with MIT license 🔥👏 OmniParser is a state-of-the-art UI parsing/understanding model that outperforms GPT4V in parsing. 👏

473,069 görüntüleme • 1 yıl önce •via X (Twitter)

9 Yorum

merve profil fotoğrafı
merve1 yıl önce

Model: Interesting highlight for me was Mind2Web (a benchmark for web navigation) capabilities of the model, which unlocks agentic behavior for RPA agents. no need for hefty web automation pipelines that get broken when the website/app design changes! Amazing work.

merve profil fotoğrafı
merve1 yıl önce

Lastly, the authors also fine-tune this model on open-set detection for interactable regions and see if they can use it as a plug-in for VLMs and it actually outperforms off-the-shelf open-set detectors like GroundingDINO. 👏

merve profil fotoğrafı
merve1 yıl önce

Here's a bunch of i/o examples for the model ⇓

Sar profil fotoğrafı
Sar1 yıl önce

I saw your post and made me think of which I had just come across. Would be interested in hearing from @skyvernai about the possibility of using OmniParser to replace their current approach

Lisan al Gaib profil fotoğrafı
Lisan al Gaib1 yıl önce

Incorrect, it has AGPL- 3.0 license since it is based on YOLOv8 by Ultralytics which has AGPL- 3.0 license. You can use it comercially, however your code must be publicly availabe, which makes it comercially unviable again.

Tarek Ayed profil fotoğrafı
Tarek Ayed1 yıl önce

It's weird to compare to GPT-4V which is notoriously bad at image understanding and OCR, right? I'd be curious to know how it fares agains 4o, Sonnet, Gemini Flash and Pro, etc.

Johannes profil fotoğrafı
Johannes1 yıl önce

Might be nice to combine with anthropic's computer use

Thread Reader App profil fotoğrafı
Thread Reader App1 yıl önce

Your thread is going viral! #TopUnroll 🙏🏼@dl4senses for 🥇unroll

Didier Lacroix profil fotoğrafı
Didier Lacroix1 yıl önce

@threadreaderapp unroll

Benzer Videolar