Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

SAMURAI vs. MetaAI's SAM 2! Traditional visual object tracking struggles in crowded, fast-moving, or self-occluded scenes, as does SAM2. Meet SAMURAI: a completely open-source adaptation of the Segment Anything Model for zero-shot visual tracking! Here's why it's a game-changer: 🚫 No need for retraining or finetuning 🎯 Boosts success...

363,264 görüntüleme • 1 yıl önce •via X (Twitter)

9 Yorum

Akshay 🚀 profil fotoğrafı
Akshay 🚀1 yıl önce

GitHub repo: _____ Interested in ML/AI Engineering? Sign up for our newsletter for in-depth lessons and get a FREE eBook with 150+ core DS/ML lessons:

BensenHsu profil fotoğrafı
BensenHsu1 yıl önce

The paper focuses on adapting the Segment Anything Model 2 (SAM 2) for visual object tracking, which is a challenging task for the original model. SAM 2 has shown strong performance in object segmentation, but it faces difficulties in handling crowded scenes with fast-moving or self-occluding objects. The improvements in tracking accuracy are attributed to the incorporation of motion information and the enhanced memory selection mechanism. These advancements help SAMURAI better handle challenging scenarios, such as crowded scenes and occlusions, where the original SAM 2 model struggles. full paper:

TechPat profil fotoğrafı
TechPat1 yıl önce

Very cool! Isn’t SAM 2 open source too?

Eswar RB profil fotoğrafı
Eswar RB1 yıl önce

Perhaps combination of different colour models can fetch promising results. Seems this is only on RGB, as in when smokes covers Samurai fails to capture the subject.

Rohan gupta profil fotoğrafı
Rohan gupta1 yıl önce

Accuracy is so crazy

kaiban profil fotoğrafı
kaiban1 yıl önce

Awesome simulation

Akshay 🚀 profil fotoğrafı
Akshay 🚀1 yıl önce

Great choice of the video to test it! Loved it!

FlameJack profil fotoğrafı
FlameJack1 yıl önce

Now this is what AI should be used for, not generative AI that is using resources without any reason other than a lack of care to learn to make things the human way that gives things meaning .

Brandon Tyler profil fotoğrafı
Brandon Tyler1 yıl önce

I’m curious if you understand how ai tracking works for technologies like hudle and veo for basketball?

Benzer Videolar

Everyone is sleeping on Meta's SAM 3 release. But it's actually a big deal. Here's why: Companies spend millions paying humans to label images and videos frame by frame. A single autonomous driving dataset? Months of work, hundreds of annotators, millions in cost. Without labeled data, you can't train custom models. Without custom models, you're stuck with generic solutions. This is why most companies never move past pilots. SAM 3 breaks this cycle. First let's look at the evolution: SAM 1 segmented objects when you clicked on them. Revolutionary, but one object at a time. SAM 2 added video tracking with memory. Game-changing, but you still manually prompted every object. SAM 3 changes everything with text prompts. Type "yellow school bus" and it finds ALL of them in your image or video. Not just one. Every instance across thousands of frames. Now here's where people get confused: "Can't I just use GPT-5 or Gemini for this?" No, and here's why that's a terrible approach. Large multimodal LLMs are great for reasoning, but they're slow and expensive for production visual tasks. You're paying API costs per image, waiting seconds for responses, getting inconsistent results. SAM 3 runs in 30 milliseconds on a single GPU for 100+ objects. That's 100x faster, and you own the infrastructure. More importantly, SAM 3 gives you precise pixel-level masks, not descriptions. Try asking an LLM to segment every defective part on a manufacturing line in real-time. It won't work. SAM 3 does this effortlessly. The real breakthrough is their data engine. Meta built an AI-human hybrid system that's 5x faster for complex annotations. They trained SAM 3 on 4 million unique visual concepts - 50x more than existing benchmarks like LVIS. SAM 3 is trained on 4 million unique visual concepts, it handles everything: - Text-based concept search - Interactive refinement with clicks - Video tracking across frames - Zero-shot detection of new concepts The model is open source. Weights, code, and benchmarks are on GitHub. If you're building computer vision applications, this is the foundation model to evaluate. The annotation time savings alone will pay for integration costs within weeks. Find the relevant links in the next tweet!

Akshay 🚀

46,351 görüntüleme • 6 ay önce