Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

🚀 Introducing T* and LV-Haystack — our latest leap forward in VLMs for long video understanding! 🧩 Lightweight plugin: T* boosting LLaVA-OV-72B (56→62%) and GPT-4o (50→53%)! ⚡ Fast inference: 34.9s → 10.4s latency, 691 → 170 TFLOPs v.s. SOTA. 📚 Large-scale dataset: 400 hours of videos + 15,000 samples....

49,615 görüntüleme • 1 yıl önce •via X (Twitter)

10 Yorum

Zihan Wang - on RAGEN profil fotoğrafı
Zihan Wang - on RAGEN1 yıl önce

Explore more: 📄 paper: 🤗 dataset: 🌐 website: 🤖 demo: 🛠️ github:

Zihan Wang - on RAGEN profil fotoğrafı
Zihan Wang - on RAGEN1 yıl önce

What’s T* ✨? A temporal search framework to locate key frames for questions. Can be plug-in to any VLM! T* turns temporal search ⏱️ into spatial search 📍 with lightweight object detectors + VLM visual grounding. Strong performance even w/o training VLMs! 2/

Zihan Wang - on RAGEN profil fotoğrafı
Zihan Wang - on RAGEN1 yıl önce

What’s LV-Haystack? A large-scale video understanding dataset: 🎞️ 400 hours of video ❓ 15,000 QA pairs 🔑 30,000 key frame labels from 45,000,000 frames We explore disentangled evaluation of temporal search & video understanding with 6 fine-grained search metrics. 3/

Zihan Wang - on RAGEN profil fotoğrafı
Zihan Wang - on RAGEN1 yıl önce

T* and LV-Haystack are the result of a joint effort of @StanfordHAI @StanfordAILab @StanfordSVL @NorthwesternEng @LTIatCMU. Huge shoutout to our incredible team for making this possible! We’d love your feedback! Reply or email us with questions, ideas, or use cases✨ 4/

Zihan Wang - on RAGEN profil fotoğrafı
Zihan Wang - on RAGEN1 yıl önce

h/t to all collaborators: @jinhuiye @wzihanw @Haosen_sun @keshigeyan @DuranteZane @CristbalEyzagu2 @anabellaisaro and our amazing mentors: @ManlingLi_ @jiajunwu_cs @drfeifei @eadeli @jcniebles @ybisk! This is just the beginning—excited for the future of video understanding and what’s next! ✨5/

Lucid Scientific, Inc. profil fotoğrafı
Lucid Scientific, Inc.1 yıl önce

Expand the possibilities of your metabolic research. Resipher tracks real-time cellular oxygen consumption in standard 96-well plates, delivering continuous real-time data directly from your incubator. Request a free virtual demo or quote today >>

Electe profil fotoğrafı
Electe1 yıl önce

@StanfordAILab @StanfordAILab, exciting advancements in video understanding.

@profitleap profil fotoğrafı
@profitleap1 yıl önce

@StanfordAILab Exciting advancements in VLMs. Looking forward to seeing the impact they will have on video understanding. 🔍

Zihan Wang - on RAGEN profil fotoğrafı
Zihan Wang - on RAGEN1 yıl önce

Great question! The VLMs we are using cannot accept audio input for now, and we think this line of research may be exciting to explore in the near future:)

Hexa Circuit profil fotoğrafı
Hexa Circuit1 yıl önce

It's essential to examine how this new integration will enhance semantic retrieval in lengthy multimedia datasets. Looks promising for advanced analytics.

Benzer Videolar