Загрузка видео...
Не удалось загрузить видео
Do Vision-Language Models represent space, and how? Spatial terms like "left" or "right" may not be enough to match images with spatial descriptions, as we often overlook the different frames of reference (FoR) used by speakers and listeners. See Figure 1 for examples! Introducing the COnsistent Multilingual Frame Of... show more
35,542 просмотров • 1 год назад •via X (Twitter)
Комментарии: 1

Martin Ziqiao Ma1 год назад
Interesting fact: Reasoning across multiple intrinsic frames of reference is quite challenging, even for GPT-4. I adapted Figure 2.5 from Levinson 2003 (Logical Inadequacies of the Intrinsic Frame of Reference) into a yes/no question format, and GPT-4 struggled with both.
