
Joy Hsu
@joycjhsu • 2,624 subscribers
CS PhD-ing @stanford & @knighthennessy. Studying visual reasoning, neuro-symbolic learning, and visual concepts @stanfordailab & @stanfordsvl.
Videos

What makes a maze look like a maze? Humans can reason about infinitely many instantiations of mazes—made of candy canes, sticks, icing, yarn, etc. But VLMs often struggle to make sense of such visual abstractions. We improve VLMs' ability to interpret these abstract concepts.
Joy Hsu43,025 views • 1 year ago

What’s left w/ foundation models? We found that they still can't ground modular concepts across domains. We present Logic-Enhanced FMs:🤝FMs & neuro-symbolic concept learners. We learn abstractions of concepts like “left” across domains & do domain-independent reasoning w/ LLMs.
Joy Hsu48,289 views • 2 years ago

How can we learn 3D visual grounding with natural supervision—only looking at QA pairs, without ground truth bounding boxes or object classification labels? We inject explicit language priors, e.g., the symmetric property that A near B ⇒ B near A, in structured vision models.
Joy Hsu14,843 views • 2 years ago
No more content to load