Sihyun Yu's banner
Sihyun Yu's profile picture

Sihyun Yu

@sihyun_yu1,374 subscribers

phd @kaist_ai | ex @NVIDIAAI @GoogleAI @NYU_Courant

Shorts

Can MLLMs actually track what's happening in a video? Introducing VSTAT 🎯, our new benchmark for visual state tracking. The tasks are simple: count cups, read typed words, count page flips. Humans solve them easily. MLLMs don't. 🧵 [1/11]

Can MLLMs actually track what's happening in a video? Introducing VSTAT 🎯, our new benchmark for visual state tracking. The tasks are simple: count cups, read typed words, count page flips. Humans solve them easily. MLLMs don't. 🧵 [1/11]

85,800 Aufrufe