Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Yann LeCun explains that large language models are trained on about 30 trillion words, representing nearly all public internet text. He says it would take a human over 500,000 years to read that much. But a 4-year-old child sees just as much visual data in their first few years... show more

Wes Roth

35,768 subscribers

294,374 views • 6 months ago •via X (Twitter)

Education Health & Wellness Science & Technology

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

📁 Yann LeCun, Chief AI Scientist at Meta, says you cannot understand the world by reading about it. All the text on the internet is around 30 trillion words, it would take a human half a million years to read it. A four year old absorbs more raw information just by looking at the world. Intelligence is not trained by text alone, it is shaped by experience.

📁 Yann LeCun, Chief AI Scientist at Meta, says you cannot understand the world by reading about it. All the text on the internet is around 30 trillion words, it would take a human half a million years to read it. A four year old absorbs more raw information just by looking at the world. Intelligence is not trained by text alone, it is shaped by experience.

Jon Hernandez

15,342 views • 4 months ago

Yann LeCun just said something that every AI-in-healthcare researcher should sit with. He basically said: If language were enough to understand the world, you could learn medicine by reading books. But you can’t. You need residency. You need to see thousands of normal cases before you recognize the abnormal one. He also points out something wild — all the public text on the internet is on the order of 10¹⁴ bytes. A 4-year-old processes about that much through vision alone. The world is just… higher bandwidth than text. I think this shift — from language models to world models — is going to matter a lot in healthcare. 🫀

Yann LeCun just said something that every AI-in-healthcare researcher should sit with. He basically said: If language were enough to understand the world, you could learn medicine by reading books. But you can’t. You need residency. You need to see thousands of normal cases before you recognize the abnormal one. He also points out something wild — all the public text on the internet is on the order of 10¹⁴ bytes. A 4-year-old processes about that much through vision alone. The world is just… higher bandwidth than text. I think this shift — from language models to world models — is going to matter a lot in healthcare. 🫀

Bo Wang

418,580 views • 4 months ago

Yann LeCun (Yann LeCun ) explains why LLMs are so limited in terms of real-world intelligence. Says the biggest LLM is trained on about 30 trillion words, which is roughly 10 to the power 14 bytes of text. That sounds huge, but a 4 year old who has been awake about 16,000 hours has also taken in about 10 to the power 14 bytes through the eyes alone. So a small child has already seen as much raw data as the largest LLM has read. But the child’s data is visual, continuous, noisy, and tied to actions: gravity, objects falling, hands grabbing, people moving, cause and effect. From this, the child builds an internal “world model” and intuitive physics, and can learn new tasks like loading a dishwasher from a handful of demonstrations. LLMs only see disconnected text and are trained just to predict the next token. So they get very good at symbol patterns, exams, and code, but they lack grounded physical understanding, real common sense, and efficient learning from a few messy real-world experiences. --- From 'Pioneer Works' YT channel (link in comment)

Yann LeCun (Yann LeCun ) explains why LLMs are so limited in terms of real-world intelligence. Says the biggest LLM is trained on about 30 trillion words, which is roughly 10 to the power 14 bytes of text. That sounds huge, but a 4 year old who has been awake about 16,000 hours has also taken in about 10 to the power 14 bytes through the eyes alone. So a small child has already seen as much raw data as the largest LLM has read. But the child’s data is visual, continuous, noisy, and tied to actions: gravity, objects falling, hands grabbing, people moving, cause and effect. From this, the child builds an internal “world model” and intuitive physics, and can learn new tasks like loading a dishwasher from a handful of demonstrations. LLMs only see disconnected text and are trained just to predict the next token. So they get very good at symbol patterns, exams, and code, but they lack grounded physical understanding, real common sense, and efficient learning from a few messy real-world experiences. --- From 'Pioneer Works' YT channel (link in comment)

Rohan Paul

639,376 views • 3 months ago

Yann LeCun (Yann LeCun) explains why LLMs are limited in terms of real-world intelligence during a Bloomberg interview. "Language is a very approximate, reduced, quantized, and simplified description of the world, and LLMs can only deal with discrete sequences of symbols. The world is much more complicated than language. The biggest LLMs are pre-trained on the totality of all the publicly available text on the internet. That’s about 20 trillion words, or 30 trillion tokens. A token is about 3 bytes. So total 10¹⁴ bytes of text. This is the amount of data a four-year-old has seen through vision during four years. Now, the text, though, would take 400,000 years to read? So, there is enormously more data from sensory input, like vision, touch, and everything else, than there could ever be through language." A child does not need 400,000 years of reading to understand cups, doors, balance, faces, falls, or heat, because the body is already collecting dense feedback from vision, touch, motion, and consequence. Text strips most of that away. It turns a living scene into symbols, then asks the model to infer the missing world from traces left by people describing it. That is why an LLM can sound fluent about physics and still have no native sense of how fragile glass feels in a hand. Moravec’s paradox names this reversal: the things humans find intellectual can be easier for machines than the things toddlers do without applause. The hard part is not producing an answer, but building a model of the world that survives contact with weight, friction, surprise, and failure. ---- Link to the full video on Bloomberg's site. Link in comment.

Yann LeCun (Yann LeCun) explains why LLMs are limited in terms of real-world intelligence during a Bloomberg interview. "Language is a very approximate, reduced, quantized, and simplified description of the world, and LLMs can only deal with discrete sequences of symbols. The world is much more complicated than language. The biggest LLMs are pre-trained on the totality of all the publicly available text on the internet. That’s about 20 trillion words, or 30 trillion tokens. A token is about 3 bytes. So total 10¹⁴ bytes of text. This is the amount of data a four-year-old has seen through vision during four years. Now, the text, though, would take 400,000 years to read? So, there is enormously more data from sensory input, like vision, touch, and everything else, than there could ever be through language." A child does not need 400,000 years of reading to understand cups, doors, balance, faces, falls, or heat, because the body is already collecting dense feedback from vision, touch, motion, and consequence. Text strips most of that away. It turns a living scene into symbols, then asks the model to infer the missing world from traces left by people describing it. That is why an LLM can sound fluent about physics and still have no native sense of how fragile glass feels in a hand. Moravec’s paradox names this reversal: the things humans find intellectual can be easier for machines than the things toddlers do without applause. The hard part is not producing an answer, but building a model of the world that survives contact with weight, friction, surprise, and failure. ---- Link to the full video on Bloomberg's site. Link in comment.

Rohan Paul

46,658 views • 10 days ago

A 4-year-old child has seen 50x more information than the biggest LLMs. Yann LeCun is the Chief AI Scientist at Meta. He recently spoke on “The Expanding Universe of Generative Models” panel at the World Economic Forum in Davos. Yann highlighted the idea that a 4-year-old child is way smarter than current cutting-edge large language models (LLMs). “Think about what a child sees through vision. Put a number on how much information a 4-year-old child has seen during their life. It’s 20 Mbps going through the optical nerve for 16,000 wake hours in the first 4 years of life. 3,600 seconds per hour is 10^15 bytes. This is 50x more information than the biggest LLMs we have. A 4-year-old child is way smarter than these models having acquired an enormous amount of knowledge about how the world works.” The real constraint right now is the ability of LLMs to think. Today, LLMs are only capable of System 1 thinking. System 1 vs System 2 thinking was popularised in the book 'Thinking, Fast and Slow' by Daniel Kahneman. System 1 tasks involve quick, instinctive, automatic responses. LLMs struggle with discontinuous tasks that require a creative leap in progress as they imitate human responses. It's hard to go above human response accuracy if LLMs are only trained on humans. Models are building the track in front of them with each word being generated. What could it mean to give language models System 2 thinking? This remains a future development I'm excited about.

A 4-year-old child has seen 50x more information than the biggest LLMs. Yann LeCun is the Chief AI Scientist at Meta. He recently spoke on “The Expanding Universe of Generative Models” panel at the World Economic Forum in Davos. Yann highlighted the idea that a 4-year-old child is way smarter than current cutting-edge large language models (LLMs). “Think about what a child sees through vision. Put a number on how much information a 4-year-old child has seen during their life. It’s 20 Mbps going through the optical nerve for 16,000 wake hours in the first 4 years of life. 3,600 seconds per hour is 10^15 bytes. This is 50x more information than the biggest LLMs we have. A 4-year-old child is way smarter than these models having acquired an enormous amount of knowledge about how the world works.” The real constraint right now is the ability of LLMs to think. Today, LLMs are only capable of System 1 thinking. System 1 vs System 2 thinking was popularised in the book 'Thinking, Fast and Slow' by Daniel Kahneman. System 1 tasks involve quick, instinctive, automatic responses. LLMs struggle with discontinuous tasks that require a creative leap in progress as they imitate human responses. It's hard to go above human response accuracy if LLMs are only trained on humans. Models are building the track in front of them with each word being generated. What could it mean to give language models System 2 thinking? This remains a future development I'm excited about.

Alex Banks

22,958 views • 2 years ago

Does GPT understand the world? Here is what Ilya Sutskever, co-founder of OpenAI, says during a discussion with Jensen Huang, CEO of Nvidia: (1) When we train a large neural network to accurately predict the next word in lots of different texts from the internet, the AI is learning a world model. (2) On the surface, it may look like learning correlations in text, but it turns out that to 'just learn' statistical correlations in text, to compress information really well, what the neural network learns is some representation of the process that produced the text. (3) This text is a projection of the world...what the neural network is learning is aspects of the world, of people, of the human conditions, their hopes, dreams, motivations, their interactions...the situations we are in. The neural network learns a compressed, abstract, usable representation." Do you think learning representations = understanding? Are large language models simply stochastic parrots, or are they much more?

Does GPT understand the world? Here is what Ilya Sutskever, co-founder of OpenAI, says during a discussion with Jensen Huang, CEO of Nvidia: (1) When we train a large neural network to accurately predict the next word in lots of different texts from the internet, the AI is learning a world model. (2) On the surface, it may look like learning correlations in text, but it turns out that to 'just learn' statistical correlations in text, to compress information really well, what the neural network learns is some representation of the process that produced the text. (3) This text is a projection of the world...what the neural network is learning is aspects of the world, of people, of the human conditions, their hopes, dreams, motivations, their interactions...the situations we are in. The neural network learns a compressed, abstract, usable representation." Do you think learning representations = understanding? Are large language models simply stochastic parrots, or are they much more?

Alex Ker 🔭

1,367,023 views • 2 years ago

FRIEDBERG: VIDEO DATA WILL POWER THE NEXT GENERATION OF AI Friedberg broke down the scale shift coming to artificial intelligence, arguing that text based models like GPT are just the beginning, and that the real revolution will come from video-trained systems: “The internet and all these LLMs are language models trained on text from the internet, around 50 billion words total, maybe one to five terabytes of data in their training sets. But if you look at the video data out there, there are hundreds of billions of hours, much of it on YouTube. By some estimates, there’s a thousand exabytes of video data on the internet, about a billion times more than text data. I think we just saw that play out with the new video model that launched yesterday. Google has all this YouTube data, whether or not they’re using it to train, I don’t know. I’ve heard from insiders they’re not allowed to yet and would have to redo the terms of service.” Source: AIFinInsights david friedberg

FRIEDBERG: VIDEO DATA WILL POWER THE NEXT GENERATION OF AI Friedberg broke down the scale shift coming to artificial intelligence, arguing that text based models like GPT are just the beginning, and that the real revolution will come from video-trained systems: “The internet and all these LLMs are language models trained on text from the internet, around 50 billion words total, maybe one to five terabytes of data in their training sets. But if you look at the video data out there, there are hundreds of billions of hours, much of it on YouTube. By some estimates, there’s a thousand exabytes of video data on the internet, about a billion times more than text data. I think we just saw that play out with the new video model that launched yesterday. Google has all this YouTube data, whether or not they’re using it to train, I don’t know. I’ve heard from insiders they’re not allowed to yet and would have to redo the terms of service.” Source: AIFinInsights david friedberg

Mario Nawfal

15,855 views • 6 months ago

Yann LeCun says we're never gonna get to human-level intelligence by just training on text AI must learn from high-bandwidth sensory data like video to build true world models Current models look PhD-smart but mostly regurgitate, with no real understanding "even a cat understands the physical world better"

Yann LeCun says we're never gonna get to human-level intelligence by just training on text AI must learn from high-bandwidth sensory data like video to build true world models Current models look PhD-smart but mostly regurgitate, with no real understanding "even a cat understands the physical world better"

Haider.

138,265 views • 8 months ago

Chris Manning says Yann LeCun sees language as a low bandwidth communication channel compared to vision. But the gap between a chimp and a human wasn’t produced by superior eyes. What took off for humans was language. Not just for communication, but as a cognitive tool.

Chris Manning says Yann LeCun sees language as a low bandwidth communication channel compared to vision. But the gap between a chimp and a human wasn’t produced by superior eyes. What took off for humans was language. Not just for communication, but as a cognitive tool.

vitrupo

118,046 views • 2 months ago

Standard Intelligence's devansh responds to roon's tweet that "text is the universal interface," and explains why their new foundation model is trained on video: "At some point in the arbitrarily long future, if we only use text models, we could force most things to be text. But I think there are just a lot of things that are much more native when done from a computer-use [perspective]." "GUIs are designed for humans to use. We have this massive long tail of things on the internet that are entirely undoable by LLMs." "For example, when I do ML engineering most of my time is spent doing the grunt work of engineering. It's a lot of looking at graphs, analyzing, and comparing loss curves. You can do this in text, but it's a much larger pain than doing it in the native interface." "There's a reason humans don't interact with a computer purely through text, it would kind of suck."

Standard Intelligence's devansh responds to roon's tweet that "text is the universal interface," and explains why their new foundation model is trained on video: "At some point in the arbitrarily long future, if we only use text models, we could force most things to be text. But I think there are just a lot of things that are much more native when done from a computer-use [perspective]." "GUIs are designed for humans to use. We have this massive long tail of things on the internet that are entirely undoable by LLMs." "For example, when I do ML engineering most of my time is spent doing the grunt work of engineering. It's a lot of looking at graphs, analyzing, and comparing loss curves. You can do this in text, but it's a much larger pain than doing it in the native interface." "There's a reason humans don't interact with a computer purely through text, it would kind of suck."

TBPN

64,140 views • 4 months ago

Demis Hassabis says world models are his longest standing passion and explains benefits vs. language models: ▫️ “I think language models are able to understand a lot about the world. More than we expected because language is actually probably richer than we thought. But there's still a lot about the spatial dynamics of the world, spatial awareness and the physical context we're in — and how that works mechanically — that is hard to describe in words and isn't generally described in corpuses of words. Alot of this is allied to learning from experience. There's a lot of things which you can't really describe something. You have to just experience it. Maybe the senses and so on are very hard to put into words. Whether that's motor angles and smell and these kinds of senses, it's very difficult to describe that in any kind of language.”▫️ This is what Demis and Google Deepmind is trying to solve with Genie. He also says that the video models (Veo) will play a part in training the world models and this is all key for AI robotics.

Demis Hassabis says world models are his longest standing passion and explains benefits vs. language models: ▫️ “I think language models are able to understand a lot about the world. More than we expected because language is actually probably richer than we thought. But there's still a lot about the spatial dynamics of the world, spatial awareness and the physical context we're in — and how that works mechanically — that is hard to describe in words and isn't generally described in corpuses of words. Alot of this is allied to learning from experience. There's a lot of things which you can't really describe something. You have to just experience it. Maybe the senses and so on are very hard to put into words. Whether that's motor angles and smell and these kinds of senses, it's very difficult to describe that in any kind of language.”▫️ This is what Demis and Google Deepmind is trying to solve with Genie. He also says that the video models (Veo) will play a part in training the world models and this is all key for AI robotics.

Bearly AI

123,666 views • 6 months ago

AI models are trained on roughly 20 trillion tokens, or every public text on the internet. and it's still not enough. The next frontier isn't more text, It's the physical world, and 375ai is already capturing it.

AI models are trained on roughly 20 trillion tokens, or every public text on the internet. and it's still not enough. The next frontier isn't more text, It's the physical world, and 375ai is already capturing it.

375ai

10,725 views • 2 months ago

Yann LeCun: "We're never going to get to human level AI by just training on text"

Yann LeCun: "We're never going to get to human level AI by just training on text"

Lior Alexander

66,864 views • 1 year ago

Demis Hassabis on the limit in today’s AI: language can describe the world, but it cannot contain it - and why "World Models" are his "longest standing passion". Language models absorbed far more structure about reality from text than many researchers expected, because human language quietly carries physics, psychology, culture, tools, plans, and cause-and-effect. But text is still a compressed residue of experience, not experience itself. A sentence can say a cup falls from a table, yet it does not fully encode weight, grip, balance, friction, timing, sound, surprise, or the tiny motor corrections a body makes before it even notices them. The world is not only made of facts that can be named; it is made of constraints that have to be lived through, touched, predicted, violated, and repaired. That is why world models matter. They aim to learn the hidden grammar of physical reality: how objects persist, how forces unfold, how space changes when an agent moves, and how action creates feedback. Language models can often reason about the world because people have written so much about it. World models try to learn what the world is like before it becomes words. The difference is exactly what matters because intelligence is not just answering well; it is knowing what would happen next if you moved, reached, pushed, smelled, slipped, or failed. A mind trained only on descriptions may become brilliant at explanation. A mind trained on experience may become better at consequence. --- Full video from "Google DeepMind" and "Hannah Fry" YT channel (link in comment)

Demis Hassabis on the limit in today’s AI: language can describe the world, but it cannot contain it - and why "World Models" are his "longest standing passion". Language models absorbed far more structure about reality from text than many researchers expected, because human language quietly carries physics, psychology, culture, tools, plans, and cause-and-effect. But text is still a compressed residue of experience, not experience itself. A sentence can say a cup falls from a table, yet it does not fully encode weight, grip, balance, friction, timing, sound, surprise, or the tiny motor corrections a body makes before it even notices them. The world is not only made of facts that can be named; it is made of constraints that have to be lived through, touched, predicted, violated, and repaired. That is why world models matter. They aim to learn the hidden grammar of physical reality: how objects persist, how forces unfold, how space changes when an agent moves, and how action creates feedback. Language models can often reason about the world because people have written so much about it. World models try to learn what the world is like before it becomes words. The difference is exactly what matters because intelligence is not just answering well; it is knowing what would happen next if you moved, reached, pushed, smelled, slipped, or failed. A mind trained only on descriptions may become brilliant at explanation. A mind trained on experience may become better at consequence. --- Full video from "Google DeepMind" and "Hannah Fry" YT channel (link in comment)

Rohan Paul

49,449 views • 1 month ago

Demis on why world models are his longest standing passion and explains benefits vs. language models: ▫️ “I think language models are able to understand a lot about the world. More than we expected because language is actually probably richer than we thought. But there's still a lot about the spatial dynamics of the world, spatial awareness and the physical context we're in — and how that works mechanically — that is hard to describe in words and isn't generally described in corpuses of words. Alot of this is allied to learning from experience. There's a lot of things which you can't really describe something. You have to just experience it. Maybe the senses and so on are very hard to put into words. Whether that's motor angles and smell and these kinds of senses, it's very difficult to describe that in any kind of language.”▫️

Demis on why world models are his longest standing passion and explains benefits vs. language models: ▫️ “I think language models are able to understand a lot about the world. More than we expected because language is actually probably richer than we thought. But there's still a lot about the spatial dynamics of the world, spatial awareness and the physical context we're in — and how that works mechanically — that is hard to describe in words and isn't generally described in corpuses of words. Alot of this is allied to learning from experience. There's a lot of things which you can't really describe something. You have to just experience it. Maybe the senses and so on are very hard to put into words. Whether that's motor angles and smell and these kinds of senses, it's very difficult to describe that in any kind of language.”▫️

Bearly AI

78,545 views • 1 month ago

Is this a sick joke? But listen to her opening words- “the one who is most interested in it is my husband he just wanted to know how much money it is and if he has to pay tax” That says it all.

Is this a sick joke? But listen to her opening words- “the one who is most interested in it is my husband he just wanted to know how much money it is and if he has to pay tax” That says it all.

Earth Hippy 🌎🕊️💚

103,225 views • 8 months ago

One of the themes of this account is that reading intervention is about so much more than just phonics. In this clip, Erin shows us how she helps her intervention students build prosody with vocabulary and sentences from a text they recently read.

One of the themes of this account is that reading intervention is about so much more than just phonics. In this clip, Erin shows us how she helps her intervention students build prosody with vocabulary and sentences from a text they recently read.

Science of Reading Classroom

10,908 views • 1 year ago

Andrej Karpathy explains why he thinks of LLMs as "people spirits." He describes them as simulations of people. The technical machinery is a transformer neural network that processes text in chunks, applying equal computing power to each piece. The system learns by reading billions of web pages and books. After training, it becomes a machine that can mimic how humans write and communicate. The key insight is that training on human content gives these models an emergent human-like psychology. This explains why LLMs often respond in surprisingly human ways. This framework opens up both their strengths and limitations. They simulate human patterns because that's their training data, but they remain statistical simulations rather than conscious beings. Understanding LLMs as "people spirits" helps set proper expectations for these tools and explains their uncannily human behavior.

Andrej Karpathy explains why he thinks of LLMs as "people spirits." He describes them as simulations of people. The technical machinery is a transformer neural network that processes text in chunks, applying equal computing power to each piece. The system learns by reading billions of web pages and books. After training, it becomes a machine that can mimic how humans write and communicate. The key insight is that training on human content gives these models an emergent human-like psychology. This explains why LLMs often respond in surprisingly human ways. This framework opens up both their strengths and limitations. They simulate human patterns because that's their training data, but they remain statistical simulations rather than conscious beings. Understanding LLMs as "people spirits" helps set proper expectations for these tools and explains their uncannily human behavior.

Aish

16,309 views • 1 year ago

We are in times when human beings are empowered with machines smarter than us. It is possible that in the next 15-20 years’ time, people will not have to work for a living. In that context, Bharat can become a beacon for the world, because the fundamental ethos of this civilization is that it is not about how much wealth one possesses, but how profound is one’s experience of life that determines how rich one is. -Sg #RepublicDay

We are in times when human beings are empowered with machines smarter than us. It is possible that in the next 15-20 years’ time, people will not have to work for a living. In that context, Bharat can become a beacon for the world, because the fundamental ethos of this civilization is that it is not about how much wealth one possesses, but how profound is one’s experience of life that determines how rich one is. -Sg #RepublicDay

Sadhguru

52,422 views • 2 years ago

Here is a great example of how Steven Furtick casually twists scripture. He speaks about not having bad days and the importance of remaining positive- quickly referencing Isaiah 27 as a proof text for it. But what we read is much, much different.

Here is a great example of how Steven Furtick casually twists scripture. He speaks about not having bad days and the importance of remaining positive- quickly referencing Isaiah 27 as a proof text for it. But what we read is much, much different.

Protestia

57,260 views • 1 year ago