Loading video...

Video Failed to Load

Go Home

SAM 2 from Meta FAIR is the first unified model for real-time, promptable object segmentation in images & videos. Using the model in our web-based demo you can segment, track and apply effects to objects in video in just a few clicks. Try SAM 2 ➡️

88,918 views • 1 year ago •via X (Twitter)

10 Comments

AshutoshShrivastava's profile picture
AshutoshShrivastava1 year ago

SAM 2 is really awesome, was really surprised to see how it tracked fast moving object.

Anu Aakash's profile picture
Anu Aakash1 year ago

sam2 is so cool

Alexandre Devaux's profile picture
Alexandre Devaux1 year ago

Super impressive 😀 I got a cool idea if we can run SAM2 in the Browser or through an API:

MindBranches's profile picture
MindBranches1 year ago

So cool! Summary of the Meta SAM 2 announcement:

Dale Carman's profile picture
Dale Carman1 year ago

it says access denied when you click the link

Tianjian Cai's profile picture
Tianjian Cai1 year ago

Why post again? I think it has already been posted.

𖤓🪄's profile picture
𖤓🪄1 year ago

Unfortunately, there are no devices to use all of this on

AI Furry Art (SFW-ish)'s profile picture
AI Furry Art (SFW-ish)1 year ago

Is there a version of this exact UI that can run locally? I know some not so tech savvy people who would get good use out of it.

hf's profile picture
hf1 year ago

It's very useful.

aitization 𝕏 's profile picture
aitization 𝕏 1 year ago

cool

Related Videos

Everyone is sleeping on Meta's SAM 3 release. But it's actually a big deal. Here's why: Companies spend millions paying humans to label images and videos frame by frame. A single autonomous driving dataset? Months of work, hundreds of annotators, millions in cost. Without labeled data, you can't train custom models. Without custom models, you're stuck with generic solutions. This is why most companies never move past pilots. SAM 3 breaks this cycle. First let's look at the evolution: SAM 1 segmented objects when you clicked on them. Revolutionary, but one object at a time. SAM 2 added video tracking with memory. Game-changing, but you still manually prompted every object. SAM 3 changes everything with text prompts. Type "yellow school bus" and it finds ALL of them in your image or video. Not just one. Every instance across thousands of frames. Now here's where people get confused: "Can't I just use GPT-5 or Gemini for this?" No, and here's why that's a terrible approach. Large multimodal LLMs are great for reasoning, but they're slow and expensive for production visual tasks. You're paying API costs per image, waiting seconds for responses, getting inconsistent results. SAM 3 runs in 30 milliseconds on a single GPU for 100+ objects. That's 100x faster, and you own the infrastructure. More importantly, SAM 3 gives you precise pixel-level masks, not descriptions. Try asking an LLM to segment every defective part on a manufacturing line in real-time. It won't work. SAM 3 does this effortlessly. The real breakthrough is their data engine. Meta built an AI-human hybrid system that's 5x faster for complex annotations. They trained SAM 3 on 4 million unique visual concepts - 50x more than existing benchmarks like LVIS. SAM 3 is trained on 4 million unique visual concepts, it handles everything: - Text-based concept search - Interactive refinement with clicks - Video tracking across frames - Zero-shot detection of new concepts The model is open source. Weights, code, and benchmarks are on GitHub. If you're building computer vision applications, this is the foundation model to evaluate. The annotation time savings alone will pay for integration costs within weeks. Find the relevant links in the next tweet!

Akshay 🚀

46,351 views • 6 months ago