Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

SAM 2 from Meta FAIR is the first unified model for real-time, promptable object segmentation in images & videos. Using the model in our web-based demo you can segment, track and apply effects to objects in video in just a few clicks. Try SAM 2 ➡️

AI at Meta

763,495 subscribers

88,918 görüntüleme • 1 yıl önce •via X (Twitter)

Bilim & Teknoloji

Anya Rossi• Live Now

Private livecam show

10 Yorum

AshutoshShrivastava profil fotoğrafı

AshutoshShrivastava1 yıl önce

SAM 2 is really awesome, was really surprised to see how it tracked fast moving object.

Anu Aakash profil fotoğrafı

Anu Aakash1 yıl önce

sam2 is so cool

Alexandre Devaux profil fotoğrafı

Alexandre Devaux1 yıl önce

Super impressive 😀 I got a cool idea if we can run SAM2 in the Browser or through an API:

MindBranches profil fotoğrafı

MindBranches1 yıl önce

So cool! Summary of the Meta SAM 2 announcement:

Dale Carman profil fotoğrafı

Dale Carman1 yıl önce

it says access denied when you click the link

Tianjian Cai profil fotoğrafı

Tianjian Cai1 yıl önce

Why post again? I think it has already been posted.

𖤓🪄 profil fotoğrafı

𖤓🪄1 yıl önce

Unfortunately, there are no devices to use all of this on

AI Furry Art (SFW-ish) profil fotoğrafı

AI Furry Art (SFW-ish)1 yıl önce

Is there a version of this exact UI that can run locally? I know some not so tech savvy people who would get good use out of it.

hf profil fotoğrafı

hf1 yıl önce

It's very useful.

aitization 𝕏  profil fotoğrafı

aitization 𝕏 1 yıl önce

cool

Benzer Videolar

Introducing Meta Segment Anything Model 2 (SAM 2) — the first unified model for real-time, promptable object segmentation in images & videos. SAM 2 is available today under Apache 2.0 so that anyone can use it to build their own experiences Details ➡️

Introducing Meta Segment Anything Model 2 (SAM 2) — the first unified model for real-time, promptable object segmentation in images & videos. SAM 2 is available today under Apache 2.0 so that anyone can use it to build their own experiences Details ➡️

AI at Meta

1,617,603 görüntüleme • 2 yıl önce

Segment Anything Model 2 (SAM 2) is a foundation model from Meta FAIR for promptable visual segmentation in images & videos. Available now for anyone to build on for free, open source under an Apache license. Try the demo ➡️

Segment Anything Model 2 (SAM 2) is a foundation model from Meta FAIR for promptable visual segmentation in images & videos. Available now for anyone to build on for free, open source under an Apache license. Try the demo ➡️

AI at Meta

97,755 görüntüleme • 1 yıl önce

The Segment Anything Model (SAM) by Meta AI is a step toward the first foundation model for image segmentation. SAM is capable of one-click segmentation of any object from photos or videos + zero-shot transfer to other segmentation tasks. Try the demo ➡️

The Segment Anything Model (SAM) by Meta AI is a step toward the first foundation model for image segmentation. SAM is capable of one-click segmentation of any object from photos or videos + zero-shot transfer to other segmentation tasks. Try the demo ➡️

AI at Meta

186,348 görüntüleme • 3 yıl önce

Meet SAM 3, a unified model that enables detection, segmentation, and tracking of objects across images and videos. SAM 3 introduces some of our most highly requested features like text and exemplar prompts to segment all objects of a target category. Learnings from SAM 3 will help power new features in Instagram Edits and Vibes, bringing advanced segmentation capabilities directly to creators. 🔗 Learn more:

Meet SAM 3, a unified model that enables detection, segmentation, and tracking of objects across images and videos. SAM 3 introduces some of our most highly requested features like text and exemplar prompts to segment all objects of a target category. Learnings from SAM 3 will help power new features in Instagram Edits and Vibes, bringing advanced segmentation capabilities directly to creators. 🔗 Learn more:

AI at Meta

190,573 görüntüleme • 8 ay önce

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

Lior Alexander

290,190 görüntüleme • 3 yıl önce

This week, grounding DINO 1.5 was released It is a new model that uses text prompts to detect objects from videos and images in real-time Examples & demo to try below:

This week, grounding DINO 1.5 was released It is a new model that uses text prompts to detect objects from videos and images in real-time Examples & demo to try below:

Allen T.

56,018 görüntüleme • 2 yıl önce

oh my.. hows this even possible Meta just dropped SAM 3D, you can auto select any object in still image and.. turn them into high quality 3D model the model is free to download and test in playground link in comments

oh my.. hows this even possible Meta just dropped SAM 3D, you can auto select any object in still image and.. turn them into high quality 3D model the model is free to download and test in playground link in comments

el.cine

794,788 görüntüleme • 8 ay önce

With #ObjectClear, you can now remove any objects, along with their shadows and reflections, from your images in just a few clicks or strokes! 👉Try our demo (click version): Big thanks to AK Adina Yakup!

With #ObjectClear, you can now remove any objects, along with their shadows and reflections, from your images in just a few clicks or strokes! 👉Try our demo (click version): Big thanks to AK Adina Yakup!

Shangchen Zhou

12,530 görüntüleme • 1 yıl önce

I'm building a web UI around Finetrainers using Gradio, to train an AI video model LoRA in a few clicks (and running inside a HF Space 🤗) Here is a shortened demo (in real-life you would use more training steps etc.. but you get the gist)

I'm building a web UI around Finetrainers using Gradio, to train an AI video model LoRA in a few clicks (and running inside a HF Space 🤗) Here is a shortened demo (in real-life you would use more training steps etc.. but you get the gist)

Julian Bilcke

30,074 görüntüleme • 1 yıl önce

Introducing Meta Locate 3D: a model for accurate object localization in 3D environments. Learn how Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. You can download the model and dataset, read our research paper, and even try a demo!

Introducing Meta Locate 3D: a model for accurate object localization in 3D environments. Learn how Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. You can download the model and dataset, read our research paper, and even try a demo!

AI at Meta

81,406 görüntüleme • 1 yıl önce

🚀 Excited to share a new tutorial on how to detect buildings 🏘️using the Segment Anything Model 2 (SAM 2) with Segment-Geospatial! You can choose interactive segmentation or utilize a GeoJSON file with building centroids or bounding boxes to do automated building detection. 📹 Check out the video tutorial: 📓 Explore the notebook: #Geospatial #OpenSource #AI #MachineLearning

🚀 Excited to share a new tutorial on how to detect buildings 🏘️using the Segment Anything Model 2 (SAM 2) with Segment-Geospatial! You can choose interactive segmentation or utilize a GeoJSON file with building centroids or bounding boxes to do automated building detection. 📹 Check out the video tutorial: 📓 Explore the notebook: #Geospatial #OpenSource #AI #MachineLearning

Qiusheng Wu

72,345 görüntüleme • 1 yıl önce

🚀 The GeoAI QGIS Plugin is here 🔥 You can run Moondream vision-language models, object detection, image segmentation (SAM 3), and even train your own geospatial segmentation model end-to-end. Website: GitHub: Short demo: Full video tutorial: #QGIS #GeoAI #SAM3 #Geospatial #DeepLearning #ComputerVision #OpenSource #Python

🚀 The GeoAI QGIS Plugin is here 🔥 You can run Moondream vision-language models, object detection, image segmentation (SAM 3), and even train your own geospatial segmentation model end-to-end. Website: GitHub: Short demo: Full video tutorial: #QGIS #GeoAI #SAM3 #Geospatial #DeepLearning #ComputerVision #OpenSource #Python

Qiusheng Wu

11,738 görüntüleme • 7 ay önce

Microsoft's new Florence 2 is big for Computer Vision. It's a merge between Text and Vision. With a single prompt you can instruct the model to do CV tasks like captioning, object detection, grounding, and segmentation. The best part, it only uses a single backbone to handle everything. ▸ Excels in zero-shot performance ▸ Unified model for detection, captioning, etc. ▸ FLD-5B dataset: 5B+ annotations, 126M images ▸ New benchmarks (>5.5+) on COCO, ADE20K

Microsoft's new Florence 2 is big for Computer Vision. It's a merge between Text and Vision. With a single prompt you can instruct the model to do CV tasks like captioning, object detection, grounding, and segmentation. The best part, it only uses a single backbone to handle everything. ▸ Excels in zero-shot performance ▸ Unified model for detection, captioning, etc. ▸ FLD-5B dataset: 5B+ annotations, 126M images ▸ New benchmarks (>5.5+) on COCO, ADE20K

Lior Alexander

186,560 görüntüleme • 2 yıl önce

🆕 Try experimental demos featuring our latest AI research! • Create video cutouts & apply effects in a few clicks. • Hear what you sound like in another language. • Bring drawings to life. • Create a story w/ AI-generated voices & sounds. AI Demos ➡️

🆕 Try experimental demos featuring our latest AI research! • Create video cutouts & apply effects in a few clicks. • Hear what you sound like in another language. • Bring drawings to life. • Create a story w/ AI-generated voices & sounds. AI Demos ➡️

AI at Meta

49,294 görüntüleme • 1 yıl önce

The first truly open-source audio-video model. LTX-2 is a DiT-based foundation model with all core video generation capabilities in one unified model. Designed to run locally on consumer GPUs. - text-to-video - image-to-video - and video-to-video modes 100% open-source.

The first truly open-source audio-video model. LTX-2 is a DiT-based foundation model with all core video generation capabilities in one unified model. Designed to run locally on consumer GPUs. - text-to-video - image-to-video - and video-to-video modes 100% open-source.

Akshay 🚀

66,088 görüntüleme • 6 ay önce

Everyone is sleeping on Meta's SAM 3 release. But it's actually a big deal. Here's why: Companies spend millions paying humans to label images and videos frame by frame. A single autonomous driving dataset? Months of work, hundreds of annotators, millions in cost. Without labeled data, you can't train custom models. Without custom models, you're stuck with generic solutions. This is why most companies never move past pilots. SAM 3 breaks this cycle. First let's look at the evolution: SAM 1 segmented objects when you clicked on them. Revolutionary, but one object at a time. SAM 2 added video tracking with memory. Game-changing, but you still manually prompted every object. SAM 3 changes everything with text prompts. Type "yellow school bus" and it finds ALL of them in your image or video. Not just one. Every instance across thousands of frames. Now here's where people get confused: "Can't I just use GPT-5 or Gemini for this?" No, and here's why that's a terrible approach. Large multimodal LLMs are great for reasoning, but they're slow and expensive for production visual tasks. You're paying API costs per image, waiting seconds for responses, getting inconsistent results. SAM 3 runs in 30 milliseconds on a single GPU for 100+ objects. That's 100x faster, and you own the infrastructure. More importantly, SAM 3 gives you precise pixel-level masks, not descriptions. Try asking an LLM to segment every defective part on a manufacturing line in real-time. It won't work. SAM 3 does this effortlessly. The real breakthrough is their data engine. Meta built an AI-human hybrid system that's 5x faster for complex annotations. They trained SAM 3 on 4 million unique visual concepts - 50x more than existing benchmarks like LVIS. SAM 3 is trained on 4 million unique visual concepts, it handles everything: - Text-based concept search - Interactive refinement with clicks - Video tracking across frames - Zero-shot detection of new concepts The model is open source. Weights, code, and benchmarks are on GitHub. If you're building computer vision applications, this is the foundation model to evaluate. The annotation time savings alone will pay for integration costs within weeks. Find the relevant links in the next tweet!

Everyone is sleeping on Meta's SAM 3 release. But it's actually a big deal. Here's why: Companies spend millions paying humans to label images and videos frame by frame. A single autonomous driving dataset? Months of work, hundreds of annotators, millions in cost. Without labeled data, you can't train custom models. Without custom models, you're stuck with generic solutions. This is why most companies never move past pilots. SAM 3 breaks this cycle. First let's look at the evolution: SAM 1 segmented objects when you clicked on them. Revolutionary, but one object at a time. SAM 2 added video tracking with memory. Game-changing, but you still manually prompted every object. SAM 3 changes everything with text prompts. Type "yellow school bus" and it finds ALL of them in your image or video. Not just one. Every instance across thousands of frames. Now here's where people get confused: "Can't I just use GPT-5 or Gemini for this?" No, and here's why that's a terrible approach. Large multimodal LLMs are great for reasoning, but they're slow and expensive for production visual tasks. You're paying API costs per image, waiting seconds for responses, getting inconsistent results. SAM 3 runs in 30 milliseconds on a single GPU for 100+ objects. That's 100x faster, and you own the infrastructure. More importantly, SAM 3 gives you precise pixel-level masks, not descriptions. Try asking an LLM to segment every defective part on a manufacturing line in real-time. It won't work. SAM 3 does this effortlessly. The real breakthrough is their data engine. Meta built an AI-human hybrid system that's 5x faster for complex annotations. They trained SAM 3 on 4 million unique visual concepts - 50x more than existing benchmarks like LVIS. SAM 3 is trained on 4 million unique visual concepts, it handles everything: - Text-based concept search - Interactive refinement with clicks - Video tracking across frames - Zero-shot detection of new concepts The model is open source. Weights, code, and benchmarks are on GitHub. If you're building computer vision applications, this is the foundation model to evaluate. The annotation time savings alone will pay for integration costs within weeks. Find the relevant links in the next tweet!

Akshay 🚀

46,404 görüntüleme • 8 ay önce

Starting today you can try our new foundation research model for audio generation. The demo includes Zero shot TTS, Text to sound effects, Infilling and more! Try Audiobox ➡️

Starting today you can try our new foundation research model for audio generation. The demo includes Zero shot TTS, Text to sound effects, Infilling and more! Try Audiobox ➡️

AI at Meta

515,618 görüntüleme • 2 yıl önce

New release from Meta FAIR — Meta Motivo is a first-of-its-kind behavioral foundation model for controlling virtual physics-based humanoid agents for a wide range of complex whole-body tasks. The model is capable of expressing human-like behaviors and achieves performance competitive with task-specific methods and outperforms state-of-the-art unsupervised RL and model-based baselines. Try the demo ➡️ Get the model and code ➡️ We’re excited about how this research could pave the way for fully embodied agents, leading to more lifelike NPCs, democratization of character animation and new types of immersive experiences.

New release from Meta FAIR — Meta Motivo is a first-of-its-kind behavioral foundation model for controlling virtual physics-based humanoid agents for a wide range of complex whole-body tasks. The model is capable of expressing human-like behaviors and achieves performance competitive with task-specific methods and outperforms state-of-the-art unsupervised RL and model-based baselines. Try the demo ➡️ Get the model and code ➡️ We’re excited about how this research could pave the way for fully embodied agents, leading to more lifelike NPCs, democratization of character animation and new types of immersive experiences.

AI at Meta

129,166 görüntüleme • 1 yıl önce

🚀 I am very excited to release the SamGeo QGIS plugin for geospatial image segmentation, powered by Meta’s Segment Anything Model (SAM 3) In this full tutorial, I’ll walk you through how to install, configure, and start segmenting satellite imagery in QGIS without writing a single line of code! 👉 Download the plugin here: 💻 Full video tutorial: #QGIS #SegmentAnything #SAM #GeoAI #RemoteSensing

🚀 I am very excited to release the SamGeo QGIS plugin for geospatial image segmentation, powered by Meta’s Segment Anything Model (SAM 3) In this full tutorial, I’ll walk you through how to install, configure, and start segmenting satellite imagery in QGIS without writing a single line of code! 👉 Download the plugin here: 💻 Full video tutorial: #QGIS #SegmentAnything #SAM #GeoAI #RemoteSensing

Qiusheng Wu

29,032 görüntüleme • 7 ay önce

CVPR 2025 papers pt. 2 - SAMWISE SAMWISE adds language understanding and temporal reasoning to SAM2; you can segment and track objects in videos just by describing them more papers: ↓ more

CVPR 2025 papers pt. 2 - SAMWISE SAMWISE adds language understanding and temporal reasoning to SAM2; you can segment and track objects in videos just by describing them more papers: ↓ more

SkalskiP

20,528 görüntüleme • 1 yıl önce