Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

✨ CVPR 2025 highlight: A Distractor-Aware Memory for Visual Object Tracking with SAM2 the authors propose a new distractor-aware memory model for SAM2 and an introspection-based update strategy that jointly addresses the segmentation accuracy as well as tracking robustness 🏡 (1/n)🧵👇

GeekyRakshit (e/mad)

1,642 subscribers

32,669 Aufrufe • vor 11 Monaten •via X (Twitter)

Wissenschaft & Technologie Bildung Kunst

Anya Rossi• Live Now

Private livecam show

7 Kommentare

Profilbild von GeekyRakshit (e/mad)

GeekyRakshit (e/mad)vor 11 Monaten

the authors redesign SAM2’s memory into two complementary parts: Recent-Appearance Memory (RAM) – a small FIFO buffer that stores the most recent frames (time-stamped) to keep segmentation accurate as the target’s appearance changes. Distractor-Resolving Memory (DRM) – a second buffer that keeps anchor frames able to disambiguate the target from hard external or internal distractors; these slots are not time-stamped, so their influence does not decay. (2/n)🧵👇

Profilbild von GeekyRakshit (e/mad)

GeekyRakshit (e/mad)vor 11 Monaten

plugging DAM and the new update rules into the off-the-shelf SAM 2.1 backbone without any retraining, yields large practical gains, setting a new SoTA (3/n)🧵👇

Profilbild von GeekyRakshit (e/mad)

GeekyRakshit (e/mad)vor 11 Monaten

the authors also create a distractor-distilled tracking dataset DiDi, to address the limitation of low distractor presence in current visual object tracking benchmarks 📀 (4/n)🧵👇

Profilbild von GeekyRakshit (e/mad)

GeekyRakshit (e/mad)vor 11 Monaten

Overall, the paper’s novelty lies in recognising that “one size fits all” memory is insufficient for distractor-heavy tracking and providing a simple, training-free remedy that lifts SAM-based tracking to state-of-the-art levels (5/5)🧵🏁

Profilbild von Ankit

Ankitvor 11 Monaten

Good stuff brother

Profilbild von Rahul

Rahulvor 11 Monaten

very cool

Profilbild von 7racker

7rackervor 11 Monaten

I feel like this example is very edge-casey

Ähnliche Videos

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware check out this SAM2 vs SAMURAI comparison! - paper: - code: - license: Apache-2.0

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware check out this SAM2 vs SAMURAI comparison! - paper: - code: - license: Apache-2.0

SkalskiP @ CVPR2026

124,355 Aufrufe • vor 1 Jahr

I'm experimenting with using SAM2 for automatic tracking of NBA players; it's mind-blowing how well this model performs out of the box. we might even add a SAM2-based tracker to the trackers package at some point. trackers:

I'm experimenting with using SAM2 for automatic tracking of NBA players; it's mind-blowing how well this model performs out of the box. we might even add a SAM2-based tracker to the trackers package at some point. trackers:

SkalskiP @ CVPR2026

292,965 Aufrufe • vor 1 Jahr

Track Anything: Segment Anything Meets Videos Track-Anything is a flexible and interactive tool for video object tracking and segmentation suitable for: - Video object tracking and segmentation with shot changes. - Visualized development and data annnotation for video object tracking and segmentation. - Object-centric downstream video tasks, such as video inpainting and editing. abs: github:

Track Anything: Segment Anything Meets Videos Track-Anything is a flexible and interactive tool for video object tracking and segmentation suitable for: - Video object tracking and segmentation with shot changes. - Visualized development and data annnotation for video object tracking and segmentation. - Object-centric downstream video tasks, such as video inpainting and editing. abs: github:

AK

578,577 Aufrufe • vor 3 Jahren

MetaAI's SAM 2 struggles when things move fast or when there are crowded, fast-moving objects! Introducing SAMURAI: An adaptation of the Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory. 100% Open Source

MetaAI's SAM 2 struggles when things move fast or when there are crowded, fast-moving objects! Introducing SAMURAI: An adaptation of the Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory. 100% Open Source

Sumanth

324,545 Aufrufe • vor 1 Jahr

SAMURAI vs. MetaAI's SAM 2! Traditional visual object tracking struggles in crowded, fast-moving, or self-occluded scenes, as does SAM2. Meet SAMURAI: a completely open-source adaptation of the Segment Anything Model for zero-shot visual tracking! Here's why it's a game-changer: 🚫 No need for retraining or finetuning 🎯 Boosts success rate and precision 🤖 Motion-aware memory selection 💪 Zero-shot performance on diverse datasets But that's not all: 🔬 Refines mask selection 🔮 Predicts object motion effectively 📈 Gains: 7.1% AUC on LaSOT, 3.5% AO on GOT-10k 🏆 Competes with fully supervised methods without extra training Link to the GitHub repo in the next tweet! _____ Find me → Akshay 🚀 ✔️ For more insights & tutorials on AI and Machine Learning.

SAMURAI vs. MetaAI's SAM 2! Traditional visual object tracking struggles in crowded, fast-moving, or self-occluded scenes, as does SAM2. Meet SAMURAI: a completely open-source adaptation of the Segment Anything Model for zero-shot visual tracking! Here's why it's a game-changer: 🚫 No need for retraining or finetuning 🎯 Boosts success rate and precision 🤖 Motion-aware memory selection 💪 Zero-shot performance on diverse datasets But that's not all: 🔬 Refines mask selection 🔮 Predicts object motion effectively 📈 Gains: 7.1% AUC on LaSOT, 3.5% AO on GOT-10k 🏆 Competes with fully supervised methods without extra training Link to the GitHub repo in the next tweet! _____ Find me → Akshay 🚀 ✔️ For more insights & tutorials on AI and Machine Learning.

Akshay 🚀

363,264 Aufrufe • vor 1 Jahr

GauS-SLAM: Dense RGB-D SLAM with Gaussian Surfels • We propose a 2D Gaussian-based incremental reconstruction strategy and a Surface-aware Depth Rendering mechanism. This approach effectively mitigates geometry distortions and improves tracking accuracy. • Our dense SLAM system features a front-end/back-end architecture and incorporates a local map design, ensuring tracking accuracy and efficiency. • We conduct extensive experiments demonstrating the superiority of our approach in both tracking accuracy and reconstruction quality compared to SOTA methods.

GauS-SLAM: Dense RGB-D SLAM with Gaussian Surfels • We propose a 2D Gaussian-based incremental reconstruction strategy and a Surface-aware Depth Rendering mechanism. This approach effectively mitigates geometry distortions and improves tracking accuracy. • Our dense SLAM system features a front-end/back-end architecture and incorporates a local map design, ensuring tracking accuracy and efficiency. • We conduct extensive experiments demonstrating the superiority of our approach in both tracking accuracy and reconstruction quality compared to SOTA methods.

MrNeRF

11,004 Aufrufe • vor 1 Jahr

Super stoked that Jordão Bragantini 's napari plugin for Meta's Segment Anything Model 2: napari-sam2, as well as our cell tracking software ultrack were showcased during Chris Cox's Keynote at Meta's Connect 2024. Jordao and I were in the audience; it was great fun! Chan Zuckerberg Biohub Network

Super stoked that Jordão Bragantini 's napari plugin for Meta's Segment Anything Model 2: napari-sam2, as well as our cell tracking software ultrack were showcased during Chris Cox's Keynote at Meta's Connect 2024. Jordao and I were in the audience; it was great fun! Chan Zuckerberg Biohub Network

Loïc A. Royer 💻🔬⚗️

41,039 Aufrufe • vor 1 Jahr

🚀 The Segment Anything Model (SAM) has been upgraded to SAM2, featuring an efficient image encoder for segmenting images and videos. But does SAM2 outperform SAM1 in medical image and video segmentation? We're thrilled to present our paper "Segment Anything in Medical Images and Videos: Benchmark and Deployment"! We comprehensively benchmark SAM2 across 11 medical image modalities and videos. 📄 Paper: 💻 Code: **Highlights:** 1. SAM2 doesn’t always outperform SAM1 in 2D medical images, but excels in video segmentation, making it more accurate and efficient for 3D images, such as CT and MR scans. 2. MedSAM still outperforms SAM2 on most 2D modalities, but SAM2 surpasses MedSAM for 3D image segmentation in a slice-by-slice approach. 3. Segmentation performance varies with model size; sometimes the smallest model outperforms larger ones. 4. Fine-tuning SAM2 significantly boosts its performance for medical image segmentation. While SAM2 may struggle with challenging objects that have unclear boundaries or low contrast, it excels in generating good initial segmentation masks for common medical images and videos. However, the official interface doesn’t support medical data formats and has limitations on video length. To address this, we've developed a 3D Slicer Plugin and Gradio API for efficient 3D medical image and video segmentation. We invite you to try them out and provide feedback! 🔧 Deployment: - 3D Slicer Plugin: - Gradio API: (Note: Due to GPU limitations, the online API is available for only 12 hours and may be slow. We highly recommend deploying the Gradio API with your own computing resources: A big shoutout to Jun Ma (JunMa) who recently joined our UHN AI hub (UHN AI Hub) as Machine Learning Lead, and kudos to all co-authors: Sumin Kim, Feifei Li, Mohammed Baharoon (Mohammed Baharoon), Reza Asakereh, and Hongwei Lyu! This is true teamwork! Looking forward to collaborating with the community to advance 3D medical image and video segmentation foundation models! University Health Network U of T Department of Computer Science Department of Laboratory Medicine & Pathobiology Temerty Centre for AI in Medicine (T-CAIREM) Vector Institute #MedTech #AIinHealthcare #DeepLearning #MedicalImaging #SAM2 #MedSAM #AIResearch

🚀 The Segment Anything Model (SAM) has been upgraded to SAM2, featuring an efficient image encoder for segmenting images and videos. But does SAM2 outperform SAM1 in medical image and video segmentation? We're thrilled to present our paper "Segment Anything in Medical Images and Videos: Benchmark and Deployment"! We comprehensively benchmark SAM2 across 11 medical image modalities and videos. 📄 Paper: 💻 Code: Highlights: 1. SAM2 doesn’t always outperform SAM1 in 2D medical images, but excels in video segmentation, making it more accurate and efficient for 3D images, such as CT and MR scans. 2. MedSAM still outperforms SAM2 on most 2D modalities, but SAM2 surpasses MedSAM for 3D image segmentation in a slice-by-slice approach. 3. Segmentation performance varies with model size; sometimes the smallest model outperforms larger ones. 4. Fine-tuning SAM2 significantly boosts its performance for medical image segmentation. While SAM2 may struggle with challenging objects that have unclear boundaries or low contrast, it excels in generating good initial segmentation masks for common medical images and videos. However, the official interface doesn’t support medical data formats and has limitations on video length. To address this, we've developed a 3D Slicer Plugin and Gradio API for efficient 3D medical image and video segmentation. We invite you to try them out and provide feedback! 🔧 Deployment: - 3D Slicer Plugin: - Gradio API: (Note: Due to GPU limitations, the online API is available for only 12 hours and may be slow. We highly recommend deploying the Gradio API with your own computing resources: A big shoutout to Jun Ma (JunMa) who recently joined our UHN AI hub (UHN AI Hub) as Machine Learning Lead, and kudos to all co-authors: Sumin Kim, Feifei Li, Mohammed Baharoon (Mohammed Baharoon), Reza Asakereh, and Hongwei Lyu! This is true teamwork! Looking forward to collaborating with the community to advance 3D medical image and video segmentation foundation models! University Health Network U of T Department of Computer Science Department of Laboratory Medicine & Pathobiology Temerty Centre for AI in Medicine (T-CAIREM) Vector Institute #MedTech #AIinHealthcare #DeepLearning #MedicalImaging #SAM2 #MedSAM #AIResearch

Bo Wang

178,419 Aufrufe • vor 1 Jahr

Can we build a generalist robotic policy that doesn’t just memorize training data and regurgitate it during test time, but instead remembers past actions as memory and conditions its decisions on them?🤖💡 Introducing SAM2Act—a multi-view robotic transformer-based policy that integrates a visual foundation model with a memory architecture for robotic manipulation. Project page: 🧵👇

Can we build a generalist robotic policy that doesn’t just memorize training data and regurgitate it during test time, but instead remembers past actions as memory and conditions its decisions on them?🤖💡 Introducing SAM2Act—a multi-view robotic transformer-based policy that integrates a visual foundation model with a memory architecture for robotic manipulation. Project page: 🧵👇

Jiafei Duan

87,573 Aufrufe • vor 1 Jahr

Super excited to introduce SAM2 Studio! 🚀🤖 I've been getting a lot of questions lately on supporting AI inference tailored for patient data and sensitive workflows. We optimized SAM2 to run completely on-device in real time for all of your medical segmentation workflows - including surgical video segmentation, radiographs, pathology slides and more!

Super excited to introduce SAM2 Studio! 🚀🤖 I've been getting a lot of questions lately on supporting AI inference tailored for patient data and sensitive workflows. We optimized SAM2 to run completely on-device in real time for all of your medical segmentation workflows - including surgical video segmentation, radiographs, pathology slides and more!

Cyril Zakka, MD

19,805 Aufrufe • vor 1 Jahr

Apache 2.0 license🔥 On-device deployment ready Extends SAM2 for tracking objects in videos 🔥 Click-to-segment support EdgeTAM by Meta !

Apache 2.0 license🔥 On-device deployment ready Extends SAM2 for tracking objects in videos 🔥 Click-to-segment support EdgeTAM by Meta !

Gradio

23,301 Aufrufe • vor 1 Jahr

1/N Most Vision-Language-Action models need tons of data for finetuning, and still fail for new objects and instructions. Introducing OTTER, a lightweight, easy-to-train model that uses text-aware visual features to nail unseen tasks out of the box! Here's how it works 👇

1/N Most Vision-Language-Action models need tons of data for finetuning, and still fail for new objects and instructions. Introducing OTTER, a lightweight, easy-to-train model that uses text-aware visual features to nail unseen tasks out of the box! Here's how it works 👇

Fangchen Liu

68,288 Aufrufe • vor 1 Jahr

*Why panorama?* Standard video models struggle with object permanence—if a camera pans away and comes back, objects may disappear. With panoramas, the model is forced to generate everything in the scene. This serves as a "working memory" for consistent world generation. (3/N)

Why panorama? Standard video models struggle with object permanence—if a camera pans away and comes back, objects may disappear. With panoramas, the model is forced to generate everything in the scene. This serves as a "working memory" for consistent world generation. (3/N)

Ziyi Wu

21,992 Aufrufe • vor 4 Monaten

Love these old meets new vfx workflows 1. extract a still of the object you want to augment 2. use image-to-3d to make a 3d model of the object 3. use that 3d geometry for classical object tracking Then you can go wild with complete control

Love these old meets new vfx workflows 1. extract a still of the object you want to augment 2. use image-to-3d to make a 3d model of the object 3. use that 3d geometry for classical object tracking Then you can go wild with complete control

Bilawal Sidhu

33,824 Aufrufe • vor 1 Jahr

New course: Agent Memory: Building Memory-Aware Agents, built in partnership with Oracle and taught by Richmond Alake and Nacho Martínez. Many agents work well within a single session but their memory resets once the session ends. Consider a research agent working on dozens of papers across multiple days: without memory, it has no way to store and retrieve what it learned across sessions. This short course teaches you to build a memory system that enables agents to persist memory and thereby learn across sessions. You'll design a Memory Manager that handles different memory types, implement semantic tool retrieval that scales without bloating the context, and build write-back pipelines that let your agent autonomously update and refine what it knows over time. Skills you'll gain: - Build persistent memory stores for different agent memory types - Implement a Memory Manager that orchestrates how your agent reads, writes, and retrieves memory - Treat tools as procedural memory and retrieve only relevant ones at inference time using semantic search Join and learn to build agents that remember and improve over time!

New course: Agent Memory: Building Memory-Aware Agents, built in partnership with Oracle and taught by Richmond Alake and Nacho Martínez. Many agents work well within a single session but their memory resets once the session ends. Consider a research agent working on dozens of papers across multiple days: without memory, it has no way to store and retrieve what it learned across sessions. This short course teaches you to build a memory system that enables agents to persist memory and thereby learn across sessions. You'll design a Memory Manager that handles different memory types, implement semantic tool retrieval that scales without bloating the context, and build write-back pipelines that let your agent autonomously update and refine what it knows over time. Skills you'll gain: - Build persistent memory stores for different agent memory types - Implement a Memory Manager that orchestrates how your agent reads, writes, and retrieves memory - Treat tools as procedural memory and retrieve only relevant ones at inference time using semantic search Join and learn to build agents that remember and improve over time!

Andrew Ng

157,523 Aufrufe • vor 2 Monaten

Deuzear 1.2 Update is now available ! This update include a Re-worked Face tracking Layer and Shape Keys, a brand new Hoodie as well as some Bugfix here and there. Get the Deuzear here >>>

Abiboi 🔜Alphazear

68,326 Aufrufe • vor 3 Jahren

#SynthEyes 2025 is out Check out the new features in BorisFX's 3D tracking app, including AI-based roto mask generation, a 3D head mesh for head tracking, and a new Multi-Export manager #matchmoving #VFX #motiongraphics Boris FX

#SynthEyes 2025 is out Check out the new features in BorisFX's 3D tracking app, including AI-based roto mask generation, a 3D head mesh for head tracking, and a new Multi-Export manager #matchmoving #VFX #motiongraphics Boris FX

CG Channel

12,665 Aufrufe • vor 1 Jahr

NEW: Tracery. 25% Off until May 24. Automate color-based object tracking and generate sophisticated, customizable visual overlays. Effortlessly produce complex visual noise, data-informed diagrams, tracking graphics, and stylish motion elements derived directly from your video content. #aftereffects #aescripts

NEW: Tracery. 25% Off until May 24. Automate color-based object tracking and generate sophisticated, customizable visual overlays. Effortlessly produce complex visual noise, data-informed diagrams, tracking graphics, and stylish motion elements derived directly from your video content. #aftereffects #aescripts

aescripts+aeplugins

37,773 Aufrufe • vor 1 Jahr

3 new SumerLive features this week: *1. Stat comparisons to league average for added context *2. Success rate as an added metric *3. IMPACT PLAYS: live tracking data based analysis that highlights players that performed above a high benchmark for their role and alignment

3 new SumerLive features this week: 1. Stat comparisons to league average for added context 2. Success rate as an added metric *3. IMPACT PLAYS: live tracking data based analysis that highlights players that performed above a high benchmark for their role and alignment

Shawn Syed

33,713 Aufrufe • vor 4 Monaten

MAC-VO: Metrics-Aware Covariance for Learning-based Stereo Visual Odometry TL;DR: learning-based stereo; learned metrics-aware matching uncertainty for dual purposes: selecting keypoint and weighing the residual in pose graph optimization.

MAC-VO: Metrics-Aware Covariance for Learning-based Stereo Visual Odometry TL;DR: learning-based stereo; learned metrics-aware matching uncertainty for dual purposes: selecting keypoint and weighing the residual in pose graph optimization.

Alexandre Morgand

17,367 Aufrufe • vor 1 Jahr