Загрузка видео...

Не удалось загрузить видео

На главную

Microsoft's new Florence 2 is big for Computer Vision. It's a merge between Text and Vision. With a single prompt you can instruct the model to do CV tasks like captioning, object detection, grounding, and segmentation. The best part, it only uses a single backbone to handle everything. ▸...

186,544 просмотров • 2 лет назад •via X (Twitter)

Комментарии: 8

Фото профиля AlphaSignal AI
AlphaSignal AI2 лет назад

@AlphaSignalAI One step closer to AGI..

Фото профиля Dash
Dash2 лет назад

@AlphaSignalAI Holy shit

Фото профиля Lior⚡
Lior⚡2 лет назад

@Mrosenmer Can't wait for the repo 👀

Фото профиля Ariyan
Ariyan2 лет назад

@AlphaSignalAI @jxnlco @skalskip92 you've seen this?

Фото профиля zachary austin
zachary austin2 лет назад

@AlphaSignalAI Look away NSA

Фото профиля ThisAndThat
ThisAndThat2 лет назад

@AlphaSignalAI Well at least the demo is not much different than YOLOv8 or similar. We have been combining a few models to achieve what you have described. If this model can do all that together and with even better performance then great. But I don't trust Microsoft. They suck.

Фото профиля Waseem
Waseem2 лет назад

@AlphaSignalAI I've attempted to do something like this with images and GPT-4V. Results have been pretty good but working on improving it. Plan to put something like this on a robot with a raspberry pi.

Фото профиля alejandro cartagena
alejandro cartagena2 лет назад

@AlphaSignalAI Look @elmanmansimov

Похожие видео