正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

✨ CVPR 2025 highlight: A Distractor-Aware Memory for Visual Object Tracking with SAM2 the authors propose a new distractor-aware memory model for SAM2 and an introspection-based update strategy that jointly addresses the segmentation accuracy as well as tracking robustness 🏡 (1/n)🧵👇

GeekyRakshit (e/mad)

1,642 subscribers

32,669 次观看 • 1 年前 •via X (Twitter)

科学技术教育艺术

Anya Rossi• Live Now

Private livecam show

7 条评论

GeekyRakshit (e/mad) 的头像

GeekyRakshit (e/mad)1 年前

the authors redesign SAM2’s memory into two complementary parts: Recent-Appearance Memory (RAM) – a small FIFO buffer that stores the most recent frames (time-stamped) to keep segmentation accurate as the target’s appearance changes. Distractor-Resolving Memory (DRM) – a second buffer that keeps anchor frames able to disambiguate the target from hard external or internal distractors; these slots are not time-stamped, so their influence does not decay. (2/n)🧵👇

GeekyRakshit (e/mad) 的头像

GeekyRakshit (e/mad)1 年前

plugging DAM and the new update rules into the off-the-shelf SAM 2.1 backbone without any retraining, yields large practical gains, setting a new SoTA (3/n)🧵👇

GeekyRakshit (e/mad) 的头像

GeekyRakshit (e/mad)1 年前

the authors also create a distractor-distilled tracking dataset DiDi, to address the limitation of low distractor presence in current visual object tracking benchmarks 📀 (4/n)🧵👇

GeekyRakshit (e/mad) 的头像

GeekyRakshit (e/mad)1 年前

Overall, the paper’s novelty lies in recognising that “one size fits all” memory is insufficient for distractor-heavy tracking and providing a simple, training-free remedy that lifts SAM-based tracking to state-of-the-art levels (5/5)🧵🏁

Ankit 的头像

Ankit1 年前

Good stuff brother

Rahul 的头像

Rahul1 年前

very cool

7racker 的头像

7racker1 年前

I feel like this example is very edge-casey

相关视频

I'm experimenting with using SAM2 for automatic tracking of NBA players; it's mind-blowing how well this model performs out of the box. we might even add a SAM2-based tracker to the trackers package at some point. trackers:

I'm experimenting with using SAM2 for automatic tracking of NBA players; it's mind-blowing how well this model performs out of the box. we might even add a SAM2-based tracker to the trackers package at some point. trackers:

SkalskiP

293,191 次观看 • 1 年前

SAMURAI vs. MetaAI's SAM 2! Traditional visual object tracking struggles in crowded, fast-moving, or self-occluded scenes, as does SAM2. Meet SAMURAI: a completely open-source adaptation of the Segment Anything Model for zero-shot visual tracking! Here's why it's a game-changer: 🚫 No need for retraining or finetuning 🎯 Boosts success rate and precision 🤖 Motion-aware memory selection 💪 Zero-shot performance on diverse datasets But that's not all: 🔬 Refines mask selection 🔮 Predicts object motion effectively 📈 Gains: 7.1% AUC on LaSOT, 3.5% AO on GOT-10k 🏆 Competes with fully supervised methods without extra training Link to the GitHub repo in the next tweet! _____ Find me → Akshay 🚀 ✔️ For more insights & tutorials on AI and Machine Learning.

SAMURAI vs. MetaAI's SAM 2! Traditional visual object tracking struggles in crowded, fast-moving, or self-occluded scenes, as does SAM2. Meet SAMURAI: a completely open-source adaptation of the Segment Anything Model for zero-shot visual tracking! Here's why it's a game-changer: 🚫 No need for retraining or finetuning 🎯 Boosts success rate and precision 🤖 Motion-aware memory selection 💪 Zero-shot performance on diverse datasets But that's not all: 🔬 Refines mask selection 🔮 Predicts object motion effectively 📈 Gains: 7.1% AUC on LaSOT, 3.5% AO on GOT-10k 🏆 Competes with fully supervised methods without extra training Link to the GitHub repo in the next tweet! _____ Find me → Akshay 🚀 ✔️ For more insights & tutorials on AI and Machine Learning.

Akshay 🚀

363,328 次观看 • 1 年前

MetaAI's SAM 2 struggles when things move fast or when there are crowded, fast-moving objects! Introducing SAMURAI: An adaptation of the Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory. 100% Open Source

MetaAI's SAM 2 struggles when things move fast or when there are crowded, fast-moving objects! Introducing SAMURAI: An adaptation of the Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory. 100% Open Source

Sumanth

324,580 次观看 • 1 年前

GauS-SLAM: Dense RGB-D SLAM with Gaussian Surfels • We propose a 2D Gaussian-based incremental reconstruction strategy and a Surface-aware Depth Rendering mechanism. This approach effectively mitigates geometry distortions and improves tracking accuracy. • Our dense SLAM system features a front-end/back-end architecture and incorporates a local map design, ensuring tracking accuracy and efficiency. • We conduct extensive experiments demonstrating the superiority of our approach in both tracking accuracy and reconstruction quality compared to SOTA methods.

GauS-SLAM: Dense RGB-D SLAM with Gaussian Surfels • We propose a 2D Gaussian-based incremental reconstruction strategy and a Surface-aware Depth Rendering mechanism. This approach effectively mitigates geometry distortions and improves tracking accuracy. • Our dense SLAM system features a front-end/back-end architecture and incorporates a local map design, ensuring tracking accuracy and efficiency. • We conduct extensive experiments demonstrating the superiority of our approach in both tracking accuracy and reconstruction quality compared to SOTA methods.

MrNeRF

11,004 次观看 • 1 年前

Super stoked that Jordão Bragantini 's napari plugin for Meta's Segment Anything Model 2: napari-sam2, as well as our cell tracking software ultrack were showcased during Chris Cox's Keynote at Meta's Connect 2024. Jordao and I were in the audience; it was great fun! Chan Zuckerberg Biohub Network

Super stoked that Jordão Bragantini 's napari plugin for Meta's Segment Anything Model 2: napari-sam2, as well as our cell tracking software ultrack were showcased during Chris Cox's Keynote at Meta's Connect 2024. Jordao and I were in the audience; it was great fun! Chan Zuckerberg Biohub Network

Loïc A. Royer 💻🔬⚗️

41,062 次观看 • 1 年前

🚀 The Segment Anything Model (SAM) has been upgraded to SAM2, featuring an efficient image encoder for segmenting images and videos. But does SAM2 outperform SAM1 in medical image and video segmentation? We're thrilled to present our paper "Segment Anything in Medical Images and Videos: Benchmark and Deployment"! We comprehensively benchmark SAM2 across 11 medical image modalities and videos. 📄 Paper: 💻 Code: **Highlights:** 1. SAM2 doesn’t always outperform SAM1 in 2D medical images, but excels in video segmentation, making it more accurate and efficient for 3D images, such as CT and MR scans. 2. MedSAM still outperforms SAM2 on most 2D modalities, but SAM2 surpasses MedSAM for 3D image segmentation in a slice-by-slice approach. 3. Segmentation performance varies with model size; sometimes the smallest model outperforms larger ones. 4. Fine-tuning SAM2 significantly boosts its performance for medical image segmentation. While SAM2 may struggle with challenging objects that have unclear boundaries or low contrast, it excels in generating good initial segmentation masks for common medical images and videos. However, the official interface doesn’t support medical data formats and has limitations on video length. To address this, we've developed a 3D Slicer Plugin and Gradio API for efficient 3D medical image and video segmentation. We invite you to try them out and provide feedback! 🔧 Deployment: - 3D Slicer Plugin: - Gradio API: (Note: Due to GPU limitations, the online API is available for only 12 hours and may be slow. We highly recommend deploying the Gradio API with your own computing resources: A big shoutout to Jun Ma (JunMa) who recently joined our UHN AI hub (UHN AI Hub) as Machine Learning Lead, and kudos to all co-authors: Sumin Kim, Feifei Li, Mohammed Baharoon (Mohammed Baharoon), Reza Asakereh, and Hongwei Lyu! This is true teamwork! Looking forward to collaborating with the community to advance 3D medical image and video segmentation foundation models! University Health Network U of T Department of Computer Science Department of Laboratory Medicine & Pathobiology Temerty Centre for AI in Medicine (T-CAIREM) Vector Institute #MedTech #AIinHealthcare #DeepLearning #MedicalImaging #SAM2 #MedSAM #AIResearch

🚀 The Segment Anything Model (SAM) has been upgraded to SAM2, featuring an efficient image encoder for segmenting images and videos. But does SAM2 outperform SAM1 in medical image and video segmentation? We're thrilled to present our paper "Segment Anything in Medical Images and Videos: Benchmark and Deployment"! We comprehensively benchmark SAM2 across 11 medical image modalities and videos. 📄 Paper: 💻 Code: Highlights: 1. SAM2 doesn’t always outperform SAM1 in 2D medical images, but excels in video segmentation, making it more accurate and efficient for 3D images, such as CT and MR scans. 2. MedSAM still outperforms SAM2 on most 2D modalities, but SAM2 surpasses MedSAM for 3D image segmentation in a slice-by-slice approach. 3. Segmentation performance varies with model size; sometimes the smallest model outperforms larger ones. 4. Fine-tuning SAM2 significantly boosts its performance for medical image segmentation. While SAM2 may struggle with challenging objects that have unclear boundaries or low contrast, it excels in generating good initial segmentation masks for common medical images and videos. However, the official interface doesn’t support medical data formats and has limitations on video length. To address this, we've developed a 3D Slicer Plugin and Gradio API for efficient 3D medical image and video segmentation. We invite you to try them out and provide feedback! 🔧 Deployment: - 3D Slicer Plugin: - Gradio API: (Note: Due to GPU limitations, the online API is available for only 12 hours and may be slow. We highly recommend deploying the Gradio API with your own computing resources: A big shoutout to Jun Ma (JunMa) who recently joined our UHN AI hub (UHN AI Hub) as Machine Learning Lead, and kudos to all co-authors: Sumin Kim, Feifei Li, Mohammed Baharoon (Mohammed Baharoon), Reza Asakereh, and Hongwei Lyu! This is true teamwork! Looking forward to collaborating with the community to advance 3D medical image and video segmentation foundation models! University Health Network U of T Department of Computer Science Department of Laboratory Medicine & Pathobiology Temerty Centre for AI in Medicine (T-CAIREM) Vector Institute #MedTech #AIinHealthcare #DeepLearning #MedicalImaging #SAM2 #MedSAM #AIResearch

Bo Wang

178,481 次观看 • 1 年前

Can we build a generalist robotic policy that doesn’t just memorize training data and regurgitate it during test time, but instead remembers past actions as memory and conditions its decisions on them?🤖💡 Introducing SAM2Act—a multi-view robotic transformer-based policy that integrates a visual foundation model with a memory architecture for robotic manipulation. Project page: 🧵👇

Can we build a generalist robotic policy that doesn’t just memorize training data and regurgitate it during test time, but instead remembers past actions as memory and conditions its decisions on them?🤖💡 Introducing SAM2Act—a multi-view robotic transformer-based policy that integrates a visual foundation model with a memory architecture for robotic manipulation. Project page: 🧵👇

Jiafei Duan

87,573 次观看 • 1 年前

1/N Most Vision-Language-Action models need tons of data for finetuning, and still fail for new objects and instructions. Introducing OTTER, a lightweight, easy-to-train model that uses text-aware visual features to nail unseen tasks out of the box! Here's how it works 👇

1/N Most Vision-Language-Action models need tons of data for finetuning, and still fail for new objects and instructions. Introducing OTTER, a lightweight, easy-to-train model that uses text-aware visual features to nail unseen tasks out of the box! Here's how it works 👇

Fangchen Liu

68,353 次观看 • 1 年前

*Why panorama?* Standard video models struggle with object permanence—if a camera pans away and comes back, objects may disappear. With panoramas, the model is forced to generate everything in the scene. This serves as a "working memory" for consistent world generation. (3/N)

Why panorama? Standard video models struggle with object permanence—if a camera pans away and comes back, objects may disappear. With panoramas, the model is forced to generate everything in the scene. This serves as a "working memory" for consistent world generation. (3/N)

Ziyi Wu

22,019 次观看 • 5 个月前

New course: Agent Memory: Building Memory-Aware Agents, built in partnership with Oracle and taught by Richmond Alake and Nacho Martínez. Many agents work well within a single session but their memory resets once the session ends. Consider a research agent working on dozens of papers across multiple days: without memory, it has no way to store and retrieve what it learned across sessions. This short course teaches you to build a memory system that enables agents to persist memory and thereby learn across sessions. You'll design a Memory Manager that handles different memory types, implement semantic tool retrieval that scales without bloating the context, and build write-back pipelines that let your agent autonomously update and refine what it knows over time. Skills you'll gain: - Build persistent memory stores for different agent memory types - Implement a Memory Manager that orchestrates how your agent reads, writes, and retrieves memory - Treat tools as procedural memory and retrieve only relevant ones at inference time using semantic search Join and learn to build agents that remember and improve over time!

New course: Agent Memory: Building Memory-Aware Agents, built in partnership with Oracle and taught by Richmond Alake and Nacho Martínez. Many agents work well within a single session but their memory resets once the session ends. Consider a research agent working on dozens of papers across multiple days: without memory, it has no way to store and retrieve what it learned across sessions. This short course teaches you to build a memory system that enables agents to persist memory and thereby learn across sessions. You'll design a Memory Manager that handles different memory types, implement semantic tool retrieval that scales without bloating the context, and build write-back pipelines that let your agent autonomously update and refine what it knows over time. Skills you'll gain: - Build persistent memory stores for different agent memory types - Implement a Memory Manager that orchestrates how your agent reads, writes, and retrieves memory - Treat tools as procedural memory and retrieve only relevant ones at inference time using semantic search Join and learn to build agents that remember and improve over time!

Andrew Ng

159,926 次观看 • 4 个月前

3 new SumerLive features this week: *1. Stat comparisons to league average for added context *2. Success rate as an added metric *3. IMPACT PLAYS: live tracking data based analysis that highlights players that performed above a high benchmark for their role and alignment

3 new SumerLive features this week: 1. Stat comparisons to league average for added context 2. Success rate as an added metric *3. IMPACT PLAYS: live tracking data based analysis that highlights players that performed above a high benchmark for their role and alignment

Shawn Syed

33,843 次观看 • 5 个月前

Tracking Anything with Decoupled Video Segmentation paper page: Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video segmentation, referring video segmentation, and unsupervised video object segmentation.

Tracking Anything with Decoupled Video Segmentation paper page: Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video segmentation, referring video segmentation, and unsupervised video object segmentation.

AK

305,633 次观看 • 2 年前

🔱 #Live3D #Live2D #HandTracking 🔱 This is the highest-spec hand-tracking model I’m currently capable of creating. The model was created based on the Live2D-based 2D Pseudo-3D concept, and for the first time, it incorporates techniques such as colored linework, rim lighting, and independent finger outlines. These new techniques allow the model to achieve a pseudo-3D effect while faithfully preserving the artist’s original art style. Please check out the video for more details✨

🔱 #Live3D #Live2D #HandTracking 🔱 This is the highest-spec hand-tracking model I’m currently capable of creating. The model was created based on the Live2D-based 2D Pseudo-3D concept, and for the first time, it incorporates techniques such as colored linework, rim lighting, and independent finger outlines. These new techniques allow the model to achieve a pseudo-3D effect while faithfully preserving the artist’s original art style. Please check out the video for more details✨

📐Hephaestus📏Live2D匠人魂

78,222 次观看 • 1 个月前

To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation paper page: The goal of Online Domain Adaptation for semantic segmentation is to handle unforeseeable domain changes that occur during deployment, like sudden weather events. However, the high computational costs associated with brute-force adaptation make this paradigm unfeasible for real-world applications. In this paper we propose HAMLET, a Hardware-Aware Modular Least Expensive Training framework for real-time domain adaptation. Our approach includes a hardware-aware back-propagation orchestration agent (HAMT) and a dedicated domain-shift detector that enables active control over when and how the model is adapted (LT). Thanks to these advancements, our approach is capable of performing semantic segmentation while simultaneously adapting at more than 29FPS on a single consumer-grade GPU. Our framework's encouraging accuracy and speed trade-off is demonstrated on OnDA and SHIFT benchmarks through experimental results.

To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation paper page: The goal of Online Domain Adaptation for semantic segmentation is to handle unforeseeable domain changes that occur during deployment, like sudden weather events. However, the high computational costs associated with brute-force adaptation make this paradigm unfeasible for real-world applications. In this paper we propose HAMLET, a Hardware-Aware Modular Least Expensive Training framework for real-time domain adaptation. Our approach includes a hardware-aware back-propagation orchestration agent (HAMT) and a dedicated domain-shift detector that enables active control over when and how the model is adapted (LT). Thanks to these advancements, our approach is capable of performing semantic segmentation while simultaneously adapting at more than 29FPS on a single consumer-grade GPU. Our framework's encouraging accuracy and speed trade-off is demonstrated on OnDA and SHIFT benchmarks through experimental results.

AK

18,871 次观看 • 3 年前

The Pawsome Edition update is out now! 😸😸😸 There are now new and fun ~cooking minigames~ for each recipe! We also did a visual update, along with some brand new UI! We hope you love the new animals as well!

Calico - Magical Girls Running Cat Cafes!

77,031 次观看 • 3 年前

New short course: Long-Term Agentic Memory with LangGraph. Learn to build an agent with long-term memory in this course developed in collaboration with taught by its Co-Founder and CEO, Harrison Chase! Personal assistance and productivity tasks have become important use cases for agents. An important feature of an AI assistant, such as a coding or calendar assistant, is its ability to keep improving over time from its experience. Agent memory is the key capability that enables this. To add memory to an agent, you must first figure out what to store and what to retrieve when it is time to use the information. Additionally, you’ll have to decide when to update the stored information. For example, you might update in each iteration loop of the agent or perform updates in the background, with a helper agent. In this course, you will learn a mental framework to build agents with long-term memory. You'll create a useful email assistant that can respond, ignore, and notify using writing, scheduling, and memory-management tools. You’ll develop your agent's memory by adding facts to its memory store, provide examples to learn the user's preferences, and optimize system prompts to evolve instructions based on previous responses. In detail, you’ll: - Learn how the three types of memory--semantic, episodic, and procedural–and the two update mechanisms–via hot path and in the background–apply to your agents. - Build an email agent with writing, scheduling, and availability tools, along with a router that triages incoming email and handles it accordingly by ignoring, responding, or notifying the user. - Add tools to your email agent that allow it to operate on semantic memory by learning facts about the user, storing them in a long-term memory store, and searching over them in future interactions. - Incorporate episodic memory, in the form of few-shot examples, in the triage step of your agents to help them learn and update user preferences. - Add procedural memory as system prompts, optimized with feedback to improve the instructions the agent follows. Learn how to approach memory in agents, and start building agents with long-term memory with LangGraph! Please sign up here:

New short course: Long-Term Agentic Memory with LangGraph. Learn to build an agent with long-term memory in this course developed in collaboration with taught by its Co-Founder and CEO, Harrison Chase! Personal assistance and productivity tasks have become important use cases for agents. An important feature of an AI assistant, such as a coding or calendar assistant, is its ability to keep improving over time from its experience. Agent memory is the key capability that enables this. To add memory to an agent, you must first figure out what to store and what to retrieve when it is time to use the information. Additionally, you’ll have to decide when to update the stored information. For example, you might update in each iteration loop of the agent or perform updates in the background, with a helper agent. In this course, you will learn a mental framework to build agents with long-term memory. You'll create a useful email assistant that can respond, ignore, and notify using writing, scheduling, and memory-management tools. You’ll develop your agent's memory by adding facts to its memory store, provide examples to learn the user's preferences, and optimize system prompts to evolve instructions based on previous responses. In detail, you’ll: - Learn how the three types of memory--semantic, episodic, and procedural–and the two update mechanisms–via hot path and in the background–apply to your agents. - Build an email agent with writing, scheduling, and availability tools, along with a router that triages incoming email and handles it accordingly by ignoring, responding, or notifying the user. - Add tools to your email agent that allow it to operate on semantic memory by learning facts about the user, storing them in a long-term memory store, and searching over them in future interactions. - Incorporate episodic memory, in the form of few-shot examples, in the triage step of your agents to help them learn and update user preferences. - Add procedural memory as system prompts, optimized with feedback to improve the instructions the agent follows. Learn how to approach memory in agents, and start building agents with long-term memory with LangGraph! Please sign up here:

Andrew Ng

131,779 次观看 • 1 年前

Final c0mmission of 2025! Got to do the face tracking for camila 🃏 🍥 's new 3D model :D Features full UE face tracking, reactive ears and a tail that wags/goes down if she smiles/frown. Also gave her a clown nose with multiple sound effects :D WE LOVE CLOWNS IN THIS HOUSE!!!!!!!

Final c0mmission of 2025! Got to do the face tracking for camila 🃏 🍥 's new 3D model :D Features full UE face tracking, reactive ears and a tail that wags/goes down if she smiles/frown. Also gave her a clown nose with multiple sound effects :D WE LOVE CLOWNS IN THIS HOUSE!!!!!!!

Threevee

214,067 次观看 • 6 个月前

"Probably my best ever memory playing hurling and definitely with the club as well." 🏆 A season to remember for Jack O’Connor with St Martin's, capped off with an AIB_GAA #GAA Club Hurling Team of the Year award.

"Probably my best ever memory playing hurling and definitely with the club as well." 🏆 A season to remember for Jack O’Connor with St Martin's, capped off with an AIB_GAA #GAA Club Hurling Team of the Year award.

The GAA

18,869 次观看 • 4 个月前