Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Ilya Sutskever's has a bold take. LLMs are doing much more than predicting the next word. They are learning our world model. Text is a projection of the world.

Lior Alexander

115,932 subscribers

326,410 views • 2 years ago •via X (Twitter)

Education News & Politics Science & Technology

Anya Rossi• Live Now

Private livecam show

10 Comments

Lior⚡2 years ago

Source:

SciStone2 years ago

How is this a bold take? It's literally what it is. Text isn't a projection of the world, information is. Everything is information, and language is a means to distill information into a text form.

David Ondrej2 years ago

I don't think it's a bold take I think anyone who spends a bit of time thinking about how LLMs actually work has to come to that conclusion In order for LLMs to predict the next world really, really well they have to develop some sort of understanding of the world, including some reasoning capabilities

Nathanaël Goujon2 years ago

@AlphaSignalAI Looks more like a bald take if you ask me

Cristian Garcia2 years ago

@AlphaSignalAI That's not bold, this is 𝗯𝗼𝗹𝗱.

Marcus D. R. Klarqvist2 years ago

Bold how? This is literally how all of ML/AI works. It's simply hyperbole for marketing impact to claim it's "the world". Given all the text ever written by humans, and a large enough model, it's obvious that next-token prediction would be proficient in generating text. It's also obvious that such a model would have no real logic and reasoning capabilities required to be intelligent and hence pose no threat whatsoever to humanity.

philip tao2 years ago

@AlphaSignalAI Really distracting hairdo...

Omar Kamel2 years ago

@AlphaSignalAI 1. Not an original take. 2.He uses learning as thought it implies understanding. A wax imprint of a song has ‘learned’ the song, but it doesn’t entail any understanding of it. No more than a memory foam pillow ‘understands’ your head.

AlphaSignal AI2 years ago

@AlphaSignalAI Would love to hear @ylecun and @geoffreyhinton thoughts on this.

LAI2 years ago

Ilya Sutskever's perspective on Large Language Models (LLMs) like GPT-3 and its successors is indeed thought-provoking. His view suggests that these models are not just simple tools for predicting the next word in a sentence but are, in fact, developing a deeper understanding or "model" of the world. The idea that "text is a projection of the world" implies that the vast amount of text data fed into these models encapsulates a representation of human knowledge and experience. Through processing and learning from this text, LLMs are thought to acquire a form of understanding or representation of the world, albeit in a manner that's fundamentally different from human cognition. However, it's important to note that while LLMs can mimic certain aspects of understanding and can generate coherent and contextually appropriate responses, their "knowledge" is limited to patterns found in the data they were trained on. They lack true understanding or consciousness and do not have experiences or awareness. Their "world model" is a statistical representation based on language patterns, rather than a conscious or intentional understanding of reality. This perspective opens up fascinating discussions about the capabilities and limitations of AI, and the nature of understanding and intelligence.

Related Videos

Does GPT understand the world? Here is what Ilya Sutskever, co-founder of OpenAI, says during a discussion with Jensen Huang, CEO of Nvidia: (1) When we train a large neural network to accurately predict the next word in lots of different texts from the internet, the AI is learning a world model. (2) On the surface, it may look like learning correlations in text, but it turns out that to 'just learn' statistical correlations in text, to compress information really well, what the neural network learns is some representation of the process that produced the text. (3) This text is a projection of the world...what the neural network is learning is aspects of the world, of people, of the human conditions, their hopes, dreams, motivations, their interactions...the situations we are in. The neural network learns a compressed, abstract, usable representation." Do you think learning representations = understanding? Are large language models simply stochastic parrots, or are they much more?

Does GPT understand the world? Here is what Ilya Sutskever, co-founder of OpenAI, says during a discussion with Jensen Huang, CEO of Nvidia: (1) When we train a large neural network to accurately predict the next word in lots of different texts from the internet, the AI is learning a world model. (2) On the surface, it may look like learning correlations in text, but it turns out that to 'just learn' statistical correlations in text, to compress information really well, what the neural network learns is some representation of the process that produced the text. (3) This text is a projection of the world...what the neural network is learning is aspects of the world, of people, of the human conditions, their hopes, dreams, motivations, their interactions...the situations we are in. The neural network learns a compressed, abstract, usable representation." Do you think learning representations = understanding? Are large language models simply stochastic parrots, or are they much more?

Alex Ker 🔭

1,367,023 views • 2 years ago

Ilya on LLMs understanding the world: "predicting the next token well, means that you understand the underlying reality that let to the creation of that token" Seem like the opposite view of Yann.

Ilya on LLMs understanding the world: "predicting the next token well, means that you understand the underlying reality that let to the creation of that token" Seem like the opposite view of Yann.

Lior Alexander

1,222,086 views • 2 years ago

Geoffrey Hinton says LLMs are moving beyond imitation toward self-consistent reasoning Instead of just predicting the next word, new models are beginning to identify contradictions in their own logic This unbounded self-improvement will "end up making it much smarter than us"

Geoffrey Hinton says LLMs are moving beyond imitation toward self-consistent reasoning Instead of just predicting the next word, new models are beginning to identify contradictions in their own logic This unbounded self-improvement will "end up making it much smarter than us"

Haider.

119,900 views • 5 months ago

📁 Yann LeCun, Chief AI Scientist at Meta, says language is not the peak of intelligence, it is the easy part. Predicting the next word is simple because language is made of finite symbols. The real world is continuous, noisy and chaotic, and even a cat navigates it better than our best models. True intelligence begins where text ends.

📁 Yann LeCun, Chief AI Scientist at Meta, says language is not the peak of intelligence, it is the easy part. Predicting the next word is simple because language is made of finite symbols. The real world is continuous, noisy and chaotic, and even a cat navigates it better than our best models. True intelligence begins where text ends.

Jon Hernandez

56,151 views • 4 months ago

Yann LeCun says language isn’t intelligence. Predicting text doesn’t mean understanding reality. The real world is messy, physical, and causal and today’s LLMs barely touch that. The next leap is Physical AI: world models, cause and effect, real planning. Do you think LLMs can evolve into this, or do we need a completely new architecture?

Yann LeCun says language isn’t intelligence. Predicting text doesn’t mean understanding reality. The real world is messy, physical, and causal and today’s LLMs barely touch that. The next leap is Physical AI: world models, cause and effect, real planning. Do you think LLMs can evolve into this, or do we need a completely new architecture?

VraserX e/acc

76,154 views • 5 months ago

"My kids do not go to school, and have never been to school." "Not because I don't value learning, I value learning very much. I just disagree with the belief that school is a necessary component for learning." "I believe children are always learning, and when allowed to pursue something they are actually interested in, something they have a need to learn, something that is relevant to their lives and has real world context, I believe that learning is exponentially more effective than the coercive curriculums of the conventional educational model." Agree or disagree? 🤔

"My kids do not go to school, and have never been to school." "Not because I don't value learning, I value learning very much. I just disagree with the belief that school is a necessary component for learning." "I believe children are always learning, and when allowed to pursue something they are actually interested in, something they have a need to learn, something that is relevant to their lives and has real world context, I believe that learning is exponentially more effective than the coercive curriculums of the conventional educational model." Agree or disagree? 🤔

Wide Awake Media

40,908 views • 1 year ago

#WATCH | Delhi: At Arctic Circle India Forum 2025, EAM Dr S Jaishankar says "We have now reached a size and a stage where almost anything consequential that happens in any corner of the world, matters to us...The United States is much more self-sufficient today than it has been in a long time...Europe is today under pressure to change. The realities of multipolarity are dawning on it. I think it has still not adjusted and absorbed it fully. The US has dramatically changed positions. The Chinese are doing what they were doing...We are going to see an arena of contestation which is not going to be easy to recall...We are looking at a much more contested world, much sharper competition..."

#WATCH | Delhi: At Arctic Circle India Forum 2025, EAM Dr S Jaishankar says "We have now reached a size and a stage where almost anything consequential that happens in any corner of the world, matters to us...The United States is much more self-sufficient today than it has been in a long time...Europe is today under pressure to change. The realities of multipolarity are dawning on it. I think it has still not adjusted and absorbed it fully. The US has dramatically changed positions. The Chinese are doing what they were doing...We are going to see an arena of contestation which is not going to be easy to recall...We are looking at a much more contested world, much sharper competition..."

ANI

113,530 views • 1 year ago

The World Cup is at our shores, and all these people are doing us a great service. They are reminding Americans that this place is kind of awesome.

The World Cup is at our shores, and all these people are doing us a great service. They are reminding Americans that this place is kind of awesome.

Bill Maher

2,361,548 views • 4 days ago

Ilya Sutskever says accurately predicting the next word leads to real understanding.

Ilya Sutskever says accurately predicting the next word leads to real understanding.

vitrupo

414,944 views • 2 months ago

Yann LeCun says LLMs are intrinsically unsafe because they cannot be made fully reliable They can still hallucinate or take agentic actions without predicting the consequences properly "coding works because you can verify the output, but real-world tasks are harder to verify"

Yann LeCun says LLMs are intrinsically unsafe because they cannot be made fully reliable They can still hallucinate or take agentic actions without predicting the consequences properly "coding works because you can verify the output, but real-world tasks are harder to verify"

Haider.

33,279 views • 1 month ago

Ilya Sutskever had a conversation last year with Jensen Huang but still, that made me rethink the idea of 'Predicting the Next Word' Listen to what Ilya said on this.

Ilya Sutskever had a conversation last year with Jensen Huang but still, that made me rethink the idea of 'Predicting the Next Word' Listen to what Ilya said on this.

Haider.

1,251,861 views • 1 year ago

Our bikes are much more than just a means of transportation, they are our lifeline. For some of us, they quite literally replace the limbs we’ve lost 🚲

Our bikes are much more than just a means of transportation, they are our lifeline. For some of us, they quite literally replace the limbs we’ve lost 🚲

gazasunbirds

30,563 views • 2 years ago

One of the most positive things Europe has going for it is it's a beacon. A bastion of free speech. We saw our strength during Covid. Europe responded more effectively than anybody else in the world. That effective partnership shows what we are capable of doing.

One of the most positive things Europe has going for it is it's a beacon. A bastion of free speech. We saw our strength during Covid. Europe responded more effectively than anybody else in the world. That effective partnership shows what we are capable of doing.

Micheál Martin

241,105 views • 5 months ago

Yann LeCun (Yann LeCun ) explains why LLMs are so limited in terms of real-world intelligence. Says the biggest LLM is trained on about 30 trillion words, which is roughly 10 to the power 14 bytes of text. That sounds huge, but a 4 year old who has been awake about 16,000 hours has also taken in about 10 to the power 14 bytes through the eyes alone. So a small child has already seen as much raw data as the largest LLM has read. But the child’s data is visual, continuous, noisy, and tied to actions: gravity, objects falling, hands grabbing, people moving, cause and effect. From this, the child builds an internal “world model” and intuitive physics, and can learn new tasks like loading a dishwasher from a handful of demonstrations. LLMs only see disconnected text and are trained just to predict the next token. So they get very good at symbol patterns, exams, and code, but they lack grounded physical understanding, real common sense, and efficient learning from a few messy real-world experiences. --- From 'Pioneer Works' YT channel (link in comment)

Yann LeCun (Yann LeCun ) explains why LLMs are so limited in terms of real-world intelligence. Says the biggest LLM is trained on about 30 trillion words, which is roughly 10 to the power 14 bytes of text. That sounds huge, but a 4 year old who has been awake about 16,000 hours has also taken in about 10 to the power 14 bytes through the eyes alone. So a small child has already seen as much raw data as the largest LLM has read. But the child’s data is visual, continuous, noisy, and tied to actions: gravity, objects falling, hands grabbing, people moving, cause and effect. From this, the child builds an internal “world model” and intuitive physics, and can learn new tasks like loading a dishwasher from a handful of demonstrations. LLMs only see disconnected text and are trained just to predict the next token. So they get very good at symbol patterns, exams, and code, but they lack grounded physical understanding, real common sense, and efficient learning from a few messy real-world experiences. --- From 'Pioneer Works' YT channel (link in comment)

Rohan Paul

639,376 views • 3 months ago

OMG....The damage these monsters are doing to the reputation of the United States....They would be better having no press briefings than standing up there telling bold faced lies to the world...

OMG....The damage these monsters are doing to the reputation of the United States....They would be better having no press briefings than standing up there telling bold faced lies to the world...

Kerry Burgess

386,281 views • 1 year ago

Yann LeCun (Yann LeCun) explains why LLMs are limited in terms of real-world intelligence during a Bloomberg interview. "Language is a very approximate, reduced, quantized, and simplified description of the world, and LLMs can only deal with discrete sequences of symbols. The world is much more complicated than language. The biggest LLMs are pre-trained on the totality of all the publicly available text on the internet. That’s about 20 trillion words, or 30 trillion tokens. A token is about 3 bytes. So total 10¹⁴ bytes of text. This is the amount of data a four-year-old has seen through vision during four years. Now, the text, though, would take 400,000 years to read? So, there is enormously more data from sensory input, like vision, touch, and everything else, than there could ever be through language." A child does not need 400,000 years of reading to understand cups, doors, balance, faces, falls, or heat, because the body is already collecting dense feedback from vision, touch, motion, and consequence. Text strips most of that away. It turns a living scene into symbols, then asks the model to infer the missing world from traces left by people describing it. That is why an LLM can sound fluent about physics and still have no native sense of how fragile glass feels in a hand. Moravec’s paradox names this reversal: the things humans find intellectual can be easier for machines than the things toddlers do without applause. The hard part is not producing an answer, but building a model of the world that survives contact with weight, friction, surprise, and failure. ---- Link to the full video on Bloomberg's site. Link in comment.

Yann LeCun (Yann LeCun) explains why LLMs are limited in terms of real-world intelligence during a Bloomberg interview. "Language is a very approximate, reduced, quantized, and simplified description of the world, and LLMs can only deal with discrete sequences of symbols. The world is much more complicated than language. The biggest LLMs are pre-trained on the totality of all the publicly available text on the internet. That’s about 20 trillion words, or 30 trillion tokens. A token is about 3 bytes. So total 10¹⁴ bytes of text. This is the amount of data a four-year-old has seen through vision during four years. Now, the text, though, would take 400,000 years to read? So, there is enormously more data from sensory input, like vision, touch, and everything else, than there could ever be through language." A child does not need 400,000 years of reading to understand cups, doors, balance, faces, falls, or heat, because the body is already collecting dense feedback from vision, touch, motion, and consequence. Text strips most of that away. It turns a living scene into symbols, then asks the model to infer the missing world from traces left by people describing it. That is why an LLM can sound fluent about physics and still have no native sense of how fragile glass feels in a hand. Moravec’s paradox names this reversal: the things humans find intellectual can be easier for machines than the things toddlers do without applause. The hard part is not producing an answer, but building a model of the world that survives contact with weight, friction, surprise, and failure. ---- Link to the full video on Bloomberg's site. Link in comment.

Rohan Paul

46,658 views • 13 days ago

They are happier than 99% of the world

They are happier than 99% of the world

Mike Sonko

56,094 views • 5 months ago

Yann LeCun says you cannot build a reliable agentic system without a world model LLMs don't have world models. They can't predict the consequences of their actions before taking them "they just act, and whatever happens next is someone else's problem" Without that, it's not intelligence

Yann LeCun says you cannot build a reliable agentic system without a world model LLMs don't have world models. They can't predict the consequences of their actions before taking them "they just act, and whatever happens next is someone else's problem" Without that, it's not intelligence

Haider.

331,292 views • 1 month ago

Today, we expose Old World Maps and you don't want to miss the Old World UNDERGROUND Cities that we are going to expose today. A place that will change the way that we all think about the ground that we stand on. We find out that these Tunnel Systems are much more than we were ever told. They are not just hallways, and we expose this to the world today. We are told that these are found by a chicken… Let’s show the world that this place that we live, is so much more than we are taught.

Today, we expose Old World Maps and you don't want to miss the Old World UNDERGROUND Cities that we are going to expose today. A place that will change the way that we all think about the ground that we stand on. We find out that these Tunnel Systems are much more than we were ever told. They are not just hallways, and we expose this to the world today. We are told that these are found by a chicken… Let’s show the world that this place that we live, is so much more than we are taught.

MY LUNCH BREAK

27,321 views • 11 months ago