Загрузка видео...

Не удалось загрузить видео

На главную

Ilya Sutskever's has a bold take. LLMs are doing much more than predicting the next word. They are learning our world model. Text is a projection of the world.

326,380 просмотров • 2 лет назад •via X (Twitter)

Комментарии: 10

Фото профиля Lior⚡
Lior⚡2 лет назад

Source:

Фото профиля SciStone
SciStone2 лет назад

How is this a bold take? It's literally what it is. Text isn't a projection of the world, information is. Everything is information, and language is a means to distill information into a text form.

Фото профиля David Ondrej
David Ondrej2 лет назад

I don't think it's a bold take I think anyone who spends a bit of time thinking about how LLMs actually work has to come to that conclusion In order for LLMs to predict the next world really, really well they have to develop some sort of understanding of the world, including some reasoning capabilities

Фото профиля Nathanaël Goujon
Nathanaël Goujon2 лет назад

@AlphaSignalAI Looks more like a bald take if you ask me

Фото профиля Cristian Garcia
Cristian Garcia2 лет назад

@AlphaSignalAI That's not bold, this is 𝗯𝗼𝗹𝗱.

Фото профиля Marcus D. R. Klarqvist
Marcus D. R. Klarqvist2 лет назад

Bold how? This is literally how all of ML/AI works. It's simply hyperbole for marketing impact to claim it's "the world". Given all the text ever written by humans, and a large enough model, it's obvious that next-token prediction would be proficient in generating text. It's also obvious that such a model would have no real logic and reasoning capabilities required to be intelligent and hence pose no threat whatsoever to humanity.

Фото профиля philip tao
philip tao2 лет назад

@AlphaSignalAI Really distracting hairdo...

Фото профиля Omar Kamel
Omar Kamel2 лет назад

@AlphaSignalAI 1. Not an original take. 2.He uses learning as thought it implies understanding. A wax imprint of a song has ‘learned’ the song, but it doesn’t entail any understanding of it. No more than a memory foam pillow ‘understands’ your head.

Фото профиля AlphaSignal AI
AlphaSignal AI2 лет назад

@AlphaSignalAI Would love to hear @ylecun and @geoffreyhinton thoughts on this.

Фото профиля LAI
LAI2 лет назад

Ilya Sutskever's perspective on Large Language Models (LLMs) like GPT-3 and its successors is indeed thought-provoking. His view suggests that these models are not just simple tools for predicting the next word in a sentence but are, in fact, developing a deeper understanding or "model" of the world. The idea that "text is a projection of the world" implies that the vast amount of text data fed into these models encapsulates a representation of human knowledge and experience. Through processing and learning from this text, LLMs are thought to acquire a form of understanding or representation of the world, albeit in a manner that's fundamentally different from human cognition. However, it's important to note that while LLMs can mimic certain aspects of understanding and can generate coherent and contextually appropriate responses, their "knowledge" is limited to patterns found in the data they were trained on. They lack true understanding or consciousness and do not have experiences or awareness. Their "world model" is a statistical representation based on language patterns, rather than a conscious or intentional understanding of reality. This perspective opens up fascinating discussions about the capabilities and limitations of AI, and the nature of understanding and intelligence.

Похожие видео

Yann LeCun (Yann LeCun) explains why LLMs are limited in terms of real-world intelligence during a Bloomberg interview. "Language is a very approximate, reduced, quantized, and simplified description of the world, and LLMs can only deal with discrete sequences of symbols. The world is much more complicated than language. The biggest LLMs are pre-trained on the totality of all the publicly available text on the internet. That’s about 20 trillion words, or 30 trillion tokens. A token is about 3 bytes. So total 10¹⁴ bytes of text. This is the amount of data a four-year-old has seen through vision during four years. Now, the text, though, would take 400,000 years to read? So, there is enormously more data from sensory input, like vision, touch, and everything else, than there could ever be through language." A child does not need 400,000 years of reading to understand cups, doors, balance, faces, falls, or heat, because the body is already collecting dense feedback from vision, touch, motion, and consequence. Text strips most of that away. It turns a living scene into symbols, then asks the model to infer the missing world from traces left by people describing it. That is why an LLM can sound fluent about physics and still have no native sense of how fragile glass feels in a hand. Moravec’s paradox names this reversal: the things humans find intellectual can be easier for machines than the things toddlers do without applause. The hard part is not producing an answer, but building a model of the world that survives contact with weight, friction, surprise, and failure. ---- Link to the full video on Bloomberg's site. Link in comment.

Rohan Paul

46,658 просмотров • 12 дней назад