Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Updated my HF Space for vibe testing smol VLMs on object detection, visual grounding, keypoint detection & counting! 👓 🆕Compare Qwen2.5 VL 3B vs Moondream 2B side-by-side with annotated images & text outputs. Try examples or test your own images! 🏃👇

15,717 görüntüleme • 11 ay önce •via X (Twitter)

10 Yorum

Sergio Paniego profil fotoğrafı
Sergio Paniego11 ay önce

📱Space: Models by @Alibaba_Qwen and @moondreamai!

merve profil fotoğrafı
merve11 ay önce

@skalskip92 @vikhyatk @JustinLin610 @onuralpszr you have to see this ^

vik profil fotoğrafı
vik11 ay önce

for moondream object detection prompting with just the object name will work better, that's how we train it

Sergio Paniego profil fotoğrafı
Sergio Paniego11 ay önce

I was unsure whether to use the full prompt or just the object name for the examples. Let me update it to make the comparison fairer 😃

Andres Franco profil fotoğrafı
Andres Franco11 ay önce

That’s impressive. Playing around with models like that must be a lot of fun.

Prithiv Sakthi 🌠 profil fotoğrafı
Prithiv Sakthi 🌠11 ay önce

This is really awesome 🤩

Reza Sayar profil fotoğrafı
Reza Sayar11 ay önce

awesome! 👏 very useful work!! 🥳🙏

Linus | web3 mobility network nRide profil fotoğrafı
Linus | web3 mobility network nRide11 ay önce

@pcuenq Vibe testing VLMs, that's really cool! I'm curious, have you explored any blockchain-based applications for object detection or visual grounding? 🤔

Onuralp S. profil fotoğrafı
Onuralp S.11 ay önce

I was experimenting with qwen and I can see it can detect each individual candies and when I ask a little bit differently it always says "colorful candies" and when I put that in to prompt I get some what better results but when I say return as "json" it just become one bbox

Johannes Gilger profil fotoğrafı
Johannes Gilger11 ay önce

This is awesome, thank you so much for that. Also really helps to show the inference time. Now do all the other small-ish VLMs like Molmo, SmolVLM, InternVL, etc 😅

Benzer Videolar