Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

CVPR 2025 papers pt. 2 - SAMWISE SAMWISE adds language understanding and temporal reasoning to SAM2; you can segment and track objects in videos just by describing them more papers: ↓ more

20,528 görüntüleme • 1 yıl önce •via X (Twitter)

9 Yorum

SkalskiP @ CVPR2025 🇺🇸 profil fotoğrafı
SkalskiP @ CVPR2025 🇺🇸1 yıl önce

- paper: - code: - video:

SkalskiP @ CVPR2025 🇺🇸 profil fotoğrafı
SkalskiP @ CVPR2025 🇺🇸1 yıl önce

SAM2 supports visual prompts like points and boxes but have no native support for text prompts. I often showed how combining SAM2 with VLMs enabled language-guided image segmentation. SAMWISE allows direct text-driven video object segmentation.

SkalskiP @ CVPR2025 🇺🇸 profil fotoğrafı
SkalskiP @ CVPR2025 🇺🇸1 yıl önce

SAM2 can make mistakes that, without human correction, will persist in subsequent frames. SAMWISE can auto correct it's own mistakes.

SkalskiP @ CVPR2025 🇺🇸 profil fotoğrafı
SkalskiP @ CVPR2025 🇺🇸1 yıl önce

SAMWISE uses a frozen Segment Anything 2 (SAM2) model and a frozen text encoder. it adds a special module called the Cross-Modal Temporal Adapter (CMT), which helps the model combine information from both the video and the text and follow changes over time.

SkalskiP @ CVPR2025 🇺🇸 profil fotoğrafı
SkalskiP @ CVPR2025 🇺🇸1 yıl önce

Conditional Memory Encoder (CME) helps the model notice when a new object fits your prompt better, so SAMWISE can automatically switch tracking, even if the correct object appears later or is hidden for a while.

SkalskiP @ CVPR2025 🇺🇸 profil fotoğrafı
SkalskiP @ CVPR2025 🇺🇸1 yıl önce

full poster explaining text understanding, temporal modeling, tracking bias, and much more

Nigam Arora profil fotoğrafı
Nigam Arora1 yıl önce

In 2025, how much more money can you make in the stock market by following the most accurate analysis?

Team Reagent profil fotoğrafı
Team Reagent1 yıl önce

Can it do this in real-time?

Team Reagent profil fotoğrafı
Team Reagent1 yıl önce

Oh we are DEFINITELY taking a look at this! Wow!!

Benzer Videolar