Loading video...

Video Failed to Load

Go Home

Yann LeCun explains that large language models are trained on about 30 trillion words, representing nearly all public internet text. He says it would take a human over 500,000 years to read that much. But a 4-year-old child sees just as much visual data in their first few years...

294,374 views • 6 months ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

Yann LeCun (Yann LeCun) explains why LLMs are limited in terms of real-world intelligence during a Bloomberg interview. "Language is a very approximate, reduced, quantized, and simplified description of the world, and LLMs can only deal with discrete sequences of symbols. The world is much more complicated than language. The biggest LLMs are pre-trained on the totality of all the publicly available text on the internet. That’s about 20 trillion words, or 30 trillion tokens. A token is about 3 bytes. So total 10¹⁴ bytes of text. This is the amount of data a four-year-old has seen through vision during four years. Now, the text, though, would take 400,000 years to read? So, there is enormously more data from sensory input, like vision, touch, and everything else, than there could ever be through language." A child does not need 400,000 years of reading to understand cups, doors, balance, faces, falls, or heat, because the body is already collecting dense feedback from vision, touch, motion, and consequence. Text strips most of that away. It turns a living scene into symbols, then asks the model to infer the missing world from traces left by people describing it. That is why an LLM can sound fluent about physics and still have no native sense of how fragile glass feels in a hand. Moravec’s paradox names this reversal: the things humans find intellectual can be easier for machines than the things toddlers do without applause. The hard part is not producing an answer, but building a model of the world that survives contact with weight, friction, surprise, and failure. ---- Link to the full video on Bloomberg's site. Link in comment.

Rohan Paul

46,658 views • 10 days ago

A 4-year-old child has seen 50x more information than the biggest LLMs. Yann LeCun is the Chief AI Scientist at Meta. He recently spoke on “The Expanding Universe of Generative Models” panel at the World Economic Forum in Davos. Yann highlighted the idea that a 4-year-old child is way smarter than current cutting-edge large language models (LLMs). “Think about what a child sees through vision. Put a number on how much information a 4-year-old child has seen during their life. It’s 20 Mbps going through the optical nerve for 16,000 wake hours in the first 4 years of life. 3,600 seconds per hour is 10^15 bytes. This is 50x more information than the biggest LLMs we have. A 4-year-old child is way smarter than these models having acquired an enormous amount of knowledge about how the world works.” The real constraint right now is the ability of LLMs to think. Today, LLMs are only capable of System 1 thinking. System 1 vs System 2 thinking was popularised in the book 'Thinking, Fast and Slow' by Daniel Kahneman. System 1 tasks involve quick, instinctive, automatic responses. LLMs struggle with discontinuous tasks that require a creative leap in progress as they imitate human responses. It's hard to go above human response accuracy if LLMs are only trained on humans. Models are building the track in front of them with each word being generated. What could it mean to give language models System 2 thinking? This remains a future development I'm excited about.

Alex Banks

22,958 views • 2 years ago

Demis Hassabis on the limit in today’s AI: language can describe the world, but it cannot contain it - and why "World Models" are his "longest standing passion". Language models absorbed far more structure about reality from text than many researchers expected, because human language quietly carries physics, psychology, culture, tools, plans, and cause-and-effect. But text is still a compressed residue of experience, not experience itself. A sentence can say a cup falls from a table, yet it does not fully encode weight, grip, balance, friction, timing, sound, surprise, or the tiny motor corrections a body makes before it even notices them. The world is not only made of facts that can be named; it is made of constraints that have to be lived through, touched, predicted, violated, and repaired. That is why world models matter. They aim to learn the hidden grammar of physical reality: how objects persist, how forces unfold, how space changes when an agent moves, and how action creates feedback. Language models can often reason about the world because people have written so much about it. World models try to learn what the world is like before it becomes words. The difference is exactly what matters because intelligence is not just answering well; it is knowing what would happen next if you moved, reached, pushed, smelled, slipped, or failed. A mind trained only on descriptions may become brilliant at explanation. A mind trained on experience may become better at consequence. --- Full video from "Google DeepMind" and "Hannah Fry" YT channel (link in comment)

Rohan Paul

49,449 views • 1 month ago