Sihyun Yu's banner

Sihyun Yu

@sihyun_yu • 1,458 subscribers

phd @kaist_ai | ex @NVIDIAAI @GoogleAI @NYU_Courant

Shorts

Can MLLMs actually track what's happening in a video? Introducing VSTAT 🎯, our new benchmark for visual state tracking. The tasks are simple: count cups, read typed words, count page flips. Humans solve them easily. MLLMs don't. 🧵 [1/11]

Can MLLMs actually track what's happening in a video? Introducing VSTAT 🎯, our new benchmark for visual state tracking. The tasks are simple: count cups, read typed words, count page flips. Humans solve them easily. MLLMs don't. 🧵 [1/11]

165,811 görüntüleme