Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

SAMURAI vs. MetaAI's SAM 2! Traditional visual object tracking struggles in crowded, fast-moving, or self-occluded scenes, as does SAM2. Meet SAMURAI: a completely open-source adaptation of the Segment Anything Model for zero-shot visual tracking! Here's why it's a game-changer: 🚫 No need for retraining or finetuning 🎯 Boosts success...

363,264 Aufrufe • vor 1 Jahr •via X (Twitter)

9 Kommentare

Profilbild von Akshay 🚀
Akshay 🚀vor 1 Jahr

GitHub repo: _____ Interested in ML/AI Engineering? Sign up for our newsletter for in-depth lessons and get a FREE eBook with 150+ core DS/ML lessons:

Profilbild von BensenHsu
BensenHsuvor 1 Jahr

The paper focuses on adapting the Segment Anything Model 2 (SAM 2) for visual object tracking, which is a challenging task for the original model. SAM 2 has shown strong performance in object segmentation, but it faces difficulties in handling crowded scenes with fast-moving or self-occluding objects. The improvements in tracking accuracy are attributed to the incorporation of motion information and the enhanced memory selection mechanism. These advancements help SAMURAI better handle challenging scenarios, such as crowded scenes and occlusions, where the original SAM 2 model struggles. full paper:

Profilbild von TechPat
TechPatvor 1 Jahr

Very cool! Isn’t SAM 2 open source too?

Profilbild von Eswar RB
Eswar RBvor 1 Jahr

Perhaps combination of different colour models can fetch promising results. Seems this is only on RGB, as in when smokes covers Samurai fails to capture the subject.

Profilbild von Rohan gupta
Rohan guptavor 1 Jahr

Accuracy is so crazy

Profilbild von kaiban
kaibanvor 1 Jahr

Awesome simulation

Profilbild von Akshay 🚀
Akshay 🚀vor 1 Jahr

Great choice of the video to test it! Loved it!

Profilbild von FlameJack
FlameJackvor 1 Jahr

Now this is what AI should be used for, not generative AI that is using resources without any reason other than a lack of care to learn to make things the human way that gives things meaning .

Profilbild von Brandon Tyler
Brandon Tylervor 1 Jahr

I’m curious if you understand how ai tracking works for technologies like hudle and veo for basketball?

Ähnliche Videos

Everyone is sleeping on Meta's SAM 3 release. But it's actually a big deal. Here's why: Companies spend millions paying humans to label images and videos frame by frame. A single autonomous driving dataset? Months of work, hundreds of annotators, millions in cost. Without labeled data, you can't train custom models. Without custom models, you're stuck with generic solutions. This is why most companies never move past pilots. SAM 3 breaks this cycle. First let's look at the evolution: SAM 1 segmented objects when you clicked on them. Revolutionary, but one object at a time. SAM 2 added video tracking with memory. Game-changing, but you still manually prompted every object. SAM 3 changes everything with text prompts. Type "yellow school bus" and it finds ALL of them in your image or video. Not just one. Every instance across thousands of frames. Now here's where people get confused: "Can't I just use GPT-5 or Gemini for this?" No, and here's why that's a terrible approach. Large multimodal LLMs are great for reasoning, but they're slow and expensive for production visual tasks. You're paying API costs per image, waiting seconds for responses, getting inconsistent results. SAM 3 runs in 30 milliseconds on a single GPU for 100+ objects. That's 100x faster, and you own the infrastructure. More importantly, SAM 3 gives you precise pixel-level masks, not descriptions. Try asking an LLM to segment every defective part on a manufacturing line in real-time. It won't work. SAM 3 does this effortlessly. The real breakthrough is their data engine. Meta built an AI-human hybrid system that's 5x faster for complex annotations. They trained SAM 3 on 4 million unique visual concepts - 50x more than existing benchmarks like LVIS. SAM 3 is trained on 4 million unique visual concepts, it handles everything: - Text-based concept search - Interactive refinement with clicks - Video tracking across frames - Zero-shot detection of new concepts The model is open source. Weights, code, and benchmarks are on GitHub. If you're building computer vision applications, this is the foundation model to evaluate. The annotation time savings alone will pay for integration costs within weeks. Find the relevant links in the next tweet!

Akshay 🚀

46,351 Aufrufe • vor 7 Monaten