
Zihan "Zenus" Wang
@wzenus • 23,015 subscribers
Reasoning agent / RL / efficiency research @NorthwesternU & @nvidia. Ex @Microsoft @yutori_ai @deepseek_ai @uiuc_nlp @RUC1937.
Videos

🚀 Introducing T* and LV-Haystack — our latest leap forward in VLMs for long video understanding! 🧩 Lightweight plugin: T* boosting LLaVA-OV-72B (56→62%) and GPT-4o (50→53%)! ⚡ Fast inference: 34.9s → 10.4s latency, 691 → 170 TFLOPs v.s. SOTA. 📚 Large-scale dataset: 400 hours of videos + 15,000 samples. → TLDR: Long video understanding needs fine-grained supervision, so we start with exploring "key frames" as the first step 🪜. 1/
Zihan "Zenus" Wang49,615 views • 1 year ago
No more content to load