Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Flapping Airplanes co-founder Asher Spector explains why data efficiency is the greatest bottleneck to AI adoption: "To the extent that AI has been hard to integrate into the economy, I really think it's because models are much less data-efficient than humans. If you want it to learn a new...

89,394 Aufrufe • vor 4 Monaten •via X (Twitter)

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

David Friedberg: Michael Burry’s Datacenter Math is Wrong “I actually think Michael Burberry's got this wrong.” “What Michael Burry is saying is that all of these hyperscalers have extended their depreciation schedule or the useful life of their data centers by roughly 2x, which cuts the operating costs in half when they report it in earnings. And so it's making their earnings inflate.” “So he's claiming they're cooking the books. Google first made this change in Q1 of 2021, where they said the servers are now going from 3 to 4 years. Separately in 2021, Google took networking equipment from 3 to 5 years. And then in 2023, they took it from 5 to 6 years.” “And so this is a result of this effort where they went in and did an analysis. So what happened?” “What happened in the data centers is that the data centers transitioned from being primarily data storage and data transfer systems, where you would use hard drives and RAM and memory to store data and then transmit it back out, to being data processing centers because of the AI boom.” “So as AI became more important in the data center, more of the dollars that are going into data centers were allocated towards chips from data storage, which initially was hard drives.” “And then suddenly, when you put these processors in to process the data to do AI, the majority of the spend and the majority of the energy is going towards the processors.” “I made some calls and I checked around with some other friends, and everyone says the same thing: that these 7-8 year old TPUs and GPUs that are sitting in the data centers are still being used and they're being used at 100% utilization.” “So that actually justifies and validates the depreciation schedule being much longer versus shorter.”

The All-In Podcast

304,259 Aufrufe • vor 6 Monaten

The most interesting part for me is where Andrej Karpathy describes why LLMs aren't able to learn like humans. As you would expect, he comes up with a wonderfully evocative phrase to describe RL: “sucking supervision bits through a straw.” A single end reward gets broadcast across every token in a successful trajectory, upweighting even wrong or irrelevant turns that lead to the right answer. > “Humans don't use reinforcement learning, as I've said before. I think they do something different. Reinforcement learning is a lot worse than the average person thinks. Reinforcement learning is terrible. It just so happens that everything that we had before is much worse.” So what do humans do instead? > “The book I’m reading is a set of prompts for me to do synthetic data generation. It's by manipulating that information that you actually gain that knowledge. We have no equivalent of that with LLMs; they don't really do that.” > “I'd love to see during pretraining some kind of a stage where the model thinks through the material and tries to reconcile it with what it already knows. There's no equivalent of any of this. This is all research.” Why can’t we just add this training to LLMs today? > “There are very subtle, hard to understand reasons why it's not trivial. If I just give synthetic generation of the model thinking about a book, you look at it and you're like, 'This looks great. Why can't I train on it?' You could try, but the model will actually get much worse if you continue trying.” > “Say we have a chapter of a book and I ask an LLM to think about it. It will give you something that looks very reasonable. But if I ask it 10 times, you'll notice that all of them are the same.” > “You're not getting the richness and the diversity and the entropy from these models as you would get from humans. How do you get synthetic data generation to work despite the collapse and while maintaining the entropy? It is a research problem.” How do humans get around model collapse? > “These analogies are surprisingly good. Humans collapse during the course of their lives. Children haven't overfit yet. They will say stuff that will shock you. Because they're not yet collapsed. But we [adults] are collapsed. We end up revisiting the same thoughts, we end up saying more and more of the same stuff, the learning rates go down, the collapse continues to get worse, and then everything deteriorates.” In fact, there’s an interesting paper arguing that dreaming evolved to assist generalization, and resist overfitting to daily learning - look up The Overfitted Brain by Erik Hoel. I asked Karpathy: Isn’t it interesting that humans learn best at a part of their lives (childhood) whose actual details they completely forget, adults still learn really well but have terrible memory about the particulars of the things they read or watch, and LLMs can memorize arbitrary details about text that no human could but are currently pretty bad at generalization? > “[Fallible human memory] is a feature, not a bug, because it forces you to only learn the generalizable components. LLMs are distracted by all the memory that they have of the pre-trained documents. That's why when I talk about the cognitive core, I actually want to remove the memory. I'd love to have them have less memory so that they have to look things up and they only maintain the algorithms for thought, and the idea of an experiment, and all this cognitive glue for acting.”

Dwarkesh Patel

1,049,820 Aufrufe • vor 7 Monaten