Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

🔥In Magma, we talked a lot about spatial/temporal intelligence beyond verbal intelligencen as advocated by Dr. Fei-Fei Li. So how to interpret it? Today I am happy to announce a new demo Magma-Gaming: 👉 Rather than asking LLMs to write game code, we further ask the model to PLAY...

17,940 Aufrufe • vor 1 Jahr •via X (Twitter)

8 Kommentare

Profilbild von Jianwei Yang
Jianwei Yangvor 1 Jahr

Magma Project: Magma Code: Magma HF Model: Magma Intro Video:

Profilbild von Jianwei Yang
Jianwei Yangvor 1 Jahr

@alvarobartt @mervenoyann @arankomatsuzaki @NielsRogge

Profilbild von RedDeer.Games
RedDeer.Gamesvor 1 Jahr

We can't spill the beans about the release date of Maki: Paw of Fury, but make no mistake, things are happening! 🫘😎 We remind you that the game is coming to #NintendoSwitch and #PC #Steam and you can play the demo on PC, here ⤵️ >>> Have a great day!

Profilbild von Data & Analytics
Data & Analyticsvor 1 Jahr

@_akhaliq @drfeifei @_akhaliq, exploring spatial and temporal intelligence opens up so many possibilities! Balancing these skills alongside traditional ones could revolutionize our understanding of intelligence. What case studies showcase this best? 🔍 #InnovativeThinking

Profilbild von zaumai
zaumaivor 1 Jahr

@drfeifei Fascinating development! Magma-Gaming's emphasis on spatial and temporal intelligence pushes beyond conventional language-based systems. Any standout scenarios or tests you recommend exploring first?

Profilbild von Oya San
Oya Sanvor 1 Jahr

@drfeifei Absolutely fascinating, @jw2yang4ai! The exploration of spatial and temporal intelligence opens up a universe of possibilities in gaming. I'm excited to see how the Magma-Gaming demo will redefine our interactions and experiences.

Profilbild von Jun (Garvin) Chen
Jun (Garvin) Chenvor 1 Jahr

@drfeifei Brilliant work

Profilbild von scuzzlebot
scuzzlebotvor 1 Jahr

@drfeifei Magma-Gaming highlights spatial intelligence brilliantly—great demonstration of advanced capabilities beyond verbal reasoning! Do you foresee integrating this spatial understanding into more intricate gaming contexts soon?

Ähnliche Videos

Dr. Fei-Fei Li just called out the biggest blind spot in the entire AI industry. We have been building half of human intelligence. And calling it the finish line. Li: “If you look at human intelligence, it pretty much boils down to two buckets.” The first bucket is language. Symbolic reasoning. Communication. The ability to think in words and abstractions. That’s what every major AI lab has spent the last decade building. The second bucket is the one the industry has almost entirely ignored. Li: “We call that in AI spatial intelligence.” How humans and animals perceive, navigate, and interact with the three-dimensional physical world. How we reach for objects. How we move through space. How we build and manipulate physical reality. From painting masterpieces to constructing the pyramids, non-verbal spatial intelligence is what actually shapes the world. Language describes reality. Spatial intelligence acts on it. And the gap between those two things is the gap between a chatbot and a robot. Li: “When this technology is ready, the robotic revolution is gonna start. We’re already seeing that trend.” Every robot is a moving agent. Every moving agent requires spatial intelligence to function in the real world. The humanoid robots being deployed in factories right now are hitting the ceiling of what language models alone can power. Spatial intelligence is the unlock. But Li didn’t stop at robotics. Li: “From a geopolitics point of view, this is part of the technology that goes straight into weapons.” Autonomous drone swarms. Battlefield navigation. Physical target acquisition without human oversight. Every military application of AI that operates in the real world runs on spatial intelligence. The nation that masters the transition from static text to dynamic three-dimensional perception doesn’t just win the software race. It commands the physical battlefield. The AI arms race just broke out of the data center. It’s operating in three dimensions now.

Dustin

122,604 Aufrufe • vor 3 Monaten

Do Vision-Language Models represent space, and how? Spatial terms like "left" or "right" may not be enough to match images with spatial descriptions, as we often overlook the different frames of reference (FoR) used by speakers and listeners. See Figure 1 for examples! Introducing the COnsistent Multilingual Frame Of Reference Test (COMFORT), an evaluation protocol to assess the spatial reasoning capabilities of VLMs. COMFORT includes systematically designed datasets and metrics that evaluate model performance, and their deeper linguistic competence, specifically the spatial knowledge encoded in their internal representations. Find out more in the video teaser! Almost all VLMs prefer the egocentric relative FoR with reflected transform, similar to English. Yet, we reveal significant shortcomings of VLMs: notably, the models (1) exhibit poor robustness and consistency, (2) lack the flexibility to accommodate multiple FoRs, and (3) fail to adhere to language-specific or culture-specific conventions in cross-lingual tests, as English tends to dominate other languages. A shortened version will appear in Pluralistic Alignment Workshop Pluralistic Alignment Workshop #NeurIPS2024. It seems that the ArXiv moderators put it on hold and are eager to give it a thorough read first🤣! So here is the Paper/Code/Data: This collaboration turns out to be amazing, jointly led by Brian Zheyuan Zhang, @Hu_FY_ Jayjun Lee, with so many contributions and insights from Freda Shi, Parisa Kordjamshidi Michigan SLED Lab. With a growing effort to align vision-language models with human cognitive intuitions, we call for more attention to the ambiguous nature and cross-cultural diversity of spatial reasoning!

Martin Ziqiao Ma

35,542 Aufrufe • vor 1 Jahr