
Joy Hsu
@joycjhsu • 2,624 subscribers
CS PhD-ing @stanford & @knighthennessy. Studying visual reasoning, neuro-symbolic learning, and visual concepts @stanfordailab & @stanfordsvl.
Videos

What makes a maze look like a maze? Humans can reason about infinitely many instantiations of mazes—made of candy canes, sticks, icing, yarn, etc. But VLMs often struggle to make sense of such visual abstractions. We improve VLMs' ability to interpret these abstract concepts.
Joy Hsu43,025 görüntüleme • 1 yıl önce

What’s left w/ foundation models? We found that they still can't ground modular concepts across domains. We present Logic-Enhanced FMs:🤝FMs & neuro-symbolic concept learners. We learn abstractions of concepts like “left” across domains & do domain-independent reasoning w/ LLMs.
Joy Hsu48,289 görüntüleme • 2 yıl önce

How can we learn 3D visual grounding with natural supervision—only looking at QA pairs, without ground truth bounding boxes or object classification labels? We inject explicit language priors, e.g., the symmetric property that A near B ⇒ B near A, in structured vision models.
Joy Hsu14,843 görüntüleme • 2 yıl önce
Daha fazla içerik yok.