Video wird geladen...
Video konnte nicht geladen werden
How can we learn 3D visual grounding with natural supervision—only looking at QA pairs, without ground truth bounding boxes or object classification labels? We inject explicit language priors, e.g., the symmetric property that A near B ⇒ B near A, in structured vision models.
14,843 Aufrufe • vor 2 Jahren •via X (Twitter)
4 Kommentare

Our Language-Regularized Concept Learner (LARC) extracts language constraints from LLMs, and uses them as regularization on the interpretable & intermediate representations of neuro-symbolic models. LARC's representations encode a variety of logical language properties.

LARC shows strong data efficiency and generalization to new concepts & dataset, and is a promising step towards injecting structured visual reasoning frameworks with explicit language-based priors, for learning in settings without dense visual supervision.

Will be presenting LARC at @CVPR next Thursday morning (session #3) with our fantastic intern @chunfeng3364, as well as @Weiyu_Liu_ and @jiajunwu_cs! Paper: Website: Code:

my nudes in profile
