正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

We often hear that machine learning models "learn patterns in data". But what does that actually look like in geometry? If you dropped a little elastic mesh into a cloud of points and let it learn, how would it fold itself to match the shape of the data? In... this scene we watch a self-organizing map...a simple unsupervised neural model...learn the shape of a two-dimensional dataset arranged in a spiral arm. On top of this, we lay down a square grid of neurons whose weights live in the same plane. At the start, this grid is just a flat net floating across the cloud...it knows nothing about the structure underneath. Learning is a repeated game: pick a random data point, find the neuron whose weight is closest, and then nudge that neuron and its neighbours toward the point. Do this again and again, while slowly shrinking how far the neighbourhood influence spreads. #MachineLearning #ManifoldLearning #UnsupervisedLearning #NeuralMaps #GeometricMLshow more

Mathelirium

34,381 subscribers

139,562 次观看 • 6 个月前 •via X (Twitter)

教育科学技术

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

We often hear that Machine Learning models learn patterns in data. But what does that actually look like in Geometry? If you dropped a little elastic mesh into a cloud of points and let it learn, how would it fold itself to match the shape of the data? In this scene we watch a Self-Organizing Map (SOM), a simple unsupervised neural model, learn the shape of a 3D datasets l, one static and the other dynamic. On top of this, we lay down a square grid of neurons whose weights live in the same plane. At the start, this grid is just a flat net floating across the cloud. It knows nothing about the structure underneath. Learning is a repeated game: Pick a random data point, find the neuron whose weight is closest, and then nudge that neuron and its neighbours toward the point. Do this again and again, while slowly shrinking how far the neighbourhood influence spreads. Python code is available for Subscribers. #MachineLearning #ManifoldLearning #UnsupervisedLearning #NeuralMaps #GeometricML

We often hear that Machine Learning models learn patterns in data. But what does that actually look like in Geometry? If you dropped a little elastic mesh into a cloud of points and let it learn, how would it fold itself to match the shape of the data? In this scene we watch a Self-Organizing Map (SOM), a simple unsupervised neural model, learn the shape of a 3D datasets l, one static and the other dynamic. On top of this, we lay down a square grid of neurons whose weights live in the same plane. At the start, this grid is just a flat net floating across the cloud. It knows nothing about the structure underneath. Learning is a repeated game: Pick a random data point, find the neuron whose weight is closest, and then nudge that neuron and its neighbours toward the point. Do this again and again, while slowly shrinking how far the neighbourhood influence spreads. Python code is available for Subscribers. #MachineLearning #ManifoldLearning #UnsupervisedLearning #NeuralMaps #GeometricML

Mathelirium

76,011 次观看 • 3 个月前

A neural network can begin as a flat sheet and learn the shape of hidden data A self-organizing map turns learning into geometry. Each data point pulls one winning neuron toward it, but nearby neurons move too, and so the whole lattice bends without losing its neighborhood structure. The strange part is that the network is not given the roll shape. It discovers the shape through competition and local cooperation. Paper: Self-Organized Formation of Topologically Correct Feature Maps Authors: Teuvo Kohonen Year: 1982

A neural network can begin as a flat sheet and learn the shape of hidden data A self-organizing map turns learning into geometry. Each data point pulls one winning neuron toward it, but nearby neurons move too, and so the whole lattice bends without losing its neighborhood structure. The strange part is that the network is not given the roll shape. It discovers the shape through competition and local cooperation. Paper: Self-Organized Formation of Topologically Correct Feature Maps Authors: Teuvo Kohonen Year: 1982

Mathelirium

129,144 次观看 • 1 个月前

Individual Neuron: Neural Network A neuron in a neural network performs a weighted sum of inputs, adds a bias, and applies an activation function like sigmoid, ReLU, or tanh, introducing non-linearity. This output helps the neuron learn and represent patterns in the data.

Individual Neuron: Neural Network A neuron in a neural network performs a weighted sum of inputs, adds a bias, and applies an activation function like sigmoid, ReLU, or tanh, introducing non-linearity. This output helps the neuron learn and represent patterns in the data.

ₕₐₘₚₜₒₙ — e/acc

47,467 次观看 • 1 年前

The Machine That Learns The Law Behind The Data A very very interesting US Patent US10963540B2 - Physics Informed Learning Machine describes a learning system that does not begin with data alone. It begins with a physical model, usually written as a differential equation (or PDE) dx/dt = f(x,t) A normal Machine Learning model sees scattered data and tries to fit it. A physics-informed learning machine starts with a law. Then it treats the data as evidence that updates what the model believes about the physical system. For this application, I use the patent idea on NASA C-MAPSS Turbofan engine data. The machine watches multivariate telemetry from a degrading engine and infers a hidden health state that is not measured directly. From that posterior belief, it estimates the engine’s remaining useful life. In the main 3D scene, the engine lifetime is turned into a tunnel. The spiral ribbons are real sensor channels evolving over cycle-time. The glowing core is the inferred health state. The surrounding cloud is uncertainty. The orange wall ahead is the predicted failure horizon. So the big picture is: sensor evidence comes in, posterior belief tightens, and the machine moves from uncertainty toward a concrete failure prediction. The inset posteriors make that explicit. The health posterior shows where the model believes the hidden engine condition sits at the current moment, and how sharply it believes it. The RUL posterior shows the same idea for remaining life... early on it is broad, later it shifts left and narrows as the machine becomes more certain about how close failure is. This idea is not limited to engines. The same idea can apply to data centers, CPUs, GPUs, cooling systems, power grids, robotics, batteries, and any machine that produces telemetry while obeying physical constraints. In an age where machine learning runs on massive hardware infrastructure, this kind of model matters: it can turn noisy sensor streams into early warnings before expensive systems fail.

The Machine That Learns The Law Behind The Data A very very interesting US Patent US10963540B2 - Physics Informed Learning Machine describes a learning system that does not begin with data alone. It begins with a physical model, usually written as a differential equation (or PDE) dx/dt = f(x,t) A normal Machine Learning model sees scattered data and tries to fit it. A physics-informed learning machine starts with a law. Then it treats the data as evidence that updates what the model believes about the physical system. For this application, I use the patent idea on NASA C-MAPSS Turbofan engine data. The machine watches multivariate telemetry from a degrading engine and infers a hidden health state that is not measured directly. From that posterior belief, it estimates the engine’s remaining useful life. In the main 3D scene, the engine lifetime is turned into a tunnel. The spiral ribbons are real sensor channels evolving over cycle-time. The glowing core is the inferred health state. The surrounding cloud is uncertainty. The orange wall ahead is the predicted failure horizon. So the big picture is: sensor evidence comes in, posterior belief tightens, and the machine moves from uncertainty toward a concrete failure prediction. The inset posteriors make that explicit. The health posterior shows where the model believes the hidden engine condition sits at the current moment, and how sharply it believes it. The RUL posterior shows the same idea for remaining life... early on it is broad, later it shifts left and narrows as the machine becomes more certain about how close failure is. This idea is not limited to engines. The same idea can apply to data centers, CPUs, GPUs, cooling systems, power grids, robotics, batteries, and any machine that produces telemetry while obeying physical constraints. In an age where machine learning runs on massive hardware infrastructure, this kind of model matters: it can turn noisy sensor streams into early warnings before expensive systems fail.

Mathelirium

17,289 次观看 • 1 个月前

Do you know how we simulate electromagnetic radiation around real systems like antennas, aircraft, and waveguides? We break the geometry into a mesh, a collection of small elements that lets the solver compute Maxwell’s equations locally instead of attacking the whole shape at once. This makes the simulations practical, fast, and accurate enough to use in real engineering.

Do you know how we simulate electromagnetic radiation around real systems like antennas, aircraft, and waveguides? We break the geometry into a mesh, a collection of small elements that lets the solver compute Maxwell’s equations locally instead of attacking the whole shape at once. This makes the simulations practical, fast, and accurate enough to use in real engineering.

Mathelirium

73,166 次观看 • 1 个月前

Geometry of Machine Learning Models - Gaussian Process Kernel In 1948 Norbert Wiener framed prediction as a correlation problem, and in the 1970s George Wahba clarified that picking a smoothness preference is the same as picking a kernel. The motivation is that whenever data i s sparse and noisy, you don't just want a fit, you want a fit with calibrated uncertainty.

Geometry of Machine Learning Models - Gaussian Process Kernel In 1948 Norbert Wiener framed prediction as a correlation problem, and in the 1970s George Wahba clarified that picking a smoothness preference is the same as picking a kernel. The motivation is that whenever data i s sparse and noisy, you don't just want a fit, you want a fit with calibrated uncertainty.

Mathelirium

15,625 次观看 • 3 个月前

New PNAS paper. Historical GDP per capita data is scarce, but data on the places of birth, death, and occupations of famous individuals is abundant. In this paper we estimate the historical GDP per capita of hundreds of regions in Europe and North America using a machine learning model that leveraged data on about 500k famous biographies. Our estimates more-or-less quadruple the availability of historical GDP per capita estimates for the last 700 years. So why use biographies to augment historical GDP per capita data? Biographical data contains information about people who might have contributed directly to economic growth, like James Watt, or that were attracted to wealthy places looking for patrons, like Michelangelo. So we--mainly Philipp (Philipp Koch)--used this data to construct hundreds of features describing each European region. Then, we trained a machine learning model to find the features that explained most of the variance in a cross-validation test, where we split regions multiple times into a training set and a test set. On average, the model explained about 90% of the variance in GDP per capita of the regions it had not seen during training. But we wanted to go further, and Philipp really went to town by looking at different ways to validate our estimates. We found our estimates correlate positively with historical measures of wellbeing, church building activity, urbanization, and body height. We also used these measures to reproduce the basic Atlantic trade result of Acemoglu, Johnson, and Robison and to explore the economic consequences of the famous Lisbon earthquake of 1755. But what I personally loved most about this project, other than working with Philipp Koch and V, is that it shows that we can use machine learning methods not only to explore the future, but the past. There is a bright and growing future in the use of machine learning for economic history. Hope you enjoy the paper and the data. You can find links to the paper and a data exploration tool in the first comment.

New PNAS paper. Historical GDP per capita data is scarce, but data on the places of birth, death, and occupations of famous individuals is abundant. In this paper we estimate the historical GDP per capita of hundreds of regions in Europe and North America using a machine learning model that leveraged data on about 500k famous biographies. Our estimates more-or-less quadruple the availability of historical GDP per capita estimates for the last 700 years. So why use biographies to augment historical GDP per capita data? Biographical data contains information about people who might have contributed directly to economic growth, like James Watt, or that were attracted to wealthy places looking for patrons, like Michelangelo. So we--mainly Philipp (Philipp Koch)--used this data to construct hundreds of features describing each European region. Then, we trained a machine learning model to find the features that explained most of the variance in a cross-validation test, where we split regions multiple times into a training set and a test set. On average, the model explained about 90% of the variance in GDP per capita of the regions it had not seen during training. But we wanted to go further, and Philipp really went to town by looking at different ways to validate our estimates. We found our estimates correlate positively with historical measures of wellbeing, church building activity, urbanization, and body height. We also used these measures to reproduce the basic Atlantic trade result of Acemoglu, Johnson, and Robison and to explore the economic consequences of the famous Lisbon earthquake of 1755. But what I personally loved most about this project, other than working with Philipp Koch and V, is that it shows that we can use machine learning methods not only to explore the future, but the past. There is a bright and growing future in the use of machine learning for economic history. Hope you enjoy the paper and the data. You can find links to the paper and a data exploration tool in the first comment.

César A. Hidalgo

54,324 次观看 • 1 年前

$“But I saved the history on CD, DVD, Magnetics” That data will all decay in a MTBF of +\-50 years. The cloud is deleting as much data that is being produced daily. At some point in the future we would have lost a majority of the 2000s-2040s. We are the amnesia generation.$

“But I saved the history on CD, DVD, Magnetics” That data will all decay in a MTBF of +\-50 years. The cloud is deleting as much data that is being produced daily. At some point in the future we would have lost a majority of the 2000s-2040s. We are the amnesia generation.

Brian Roemmele

410,355 次观看 • 2 年前

A Neural Network Can Grow New Neurons Where It Is Confused? In 1994, Bernd Fritzke published A Growing Neural Gas Network Learns Topologies. He introduced a network that starts small, follows incoming data, and inserts new neurons where its error is highest. In the animation, the fog is the drifting data. The glowing nodes are neurons. The fibers are learned connections. The network grows into a living skeleton of the manifold.

A Neural Network Can Grow New Neurons Where It Is Confused? In 1994, Bernd Fritzke published A Growing Neural Gas Network Learns Topologies. He introduced a network that starts small, follows incoming data, and inserts new neurons where its error is highest. In the animation, the fog is the drifting data. The glowing nodes are neurons. The fibers are learned connections. The network grows into a living skeleton of the manifold.

Mathelirium

38,112 次观看 • 28 天前

I don’t know if we live in a Matrix, but I know for sure that robots will spend most of their lives in simulation. Let machines train machines. I’m excited to introduce DexMimicGen, a massive-scale synthetic data generator that enables a humanoid robot to learn complex skills from only a handful of human demonstrations. Yes, as few as 5! DexMimicGen addresses the biggest pain point in robotics: where do we get data? Unlike with LLMs, where vast amounts of texts are readily available, you cannot simply download motor control signals from the internet. So researchers teleoperate the robots to collect motion data via XR headsets. They have to repeat the same skill over and over and over again, because neural nets are data hungry. This is a very slow and uncomfortable process. At NVIDIA, we believe the majority of high-quality tokens for robot foundation models will come from simulation. What DexMimicGen does is to trade GPU compute time for human time. It takes one motion trajectory from human, and multiplies into 1000s of new trajectories. A robot brain trained on this augmented dataset will generalize far better in the real world. Think of DexMimicGen as a learning signal amplifier. It maps a small dataset to a large (de facto infinite) dataset, using physics simulation in the loop. In this way, we free humans from babysitting the bots all day. The future of robot data is generative. The future of the entire robot learning pipeline will also be generative. 🧵

I don’t know if we live in a Matrix, but I know for sure that robots will spend most of their lives in simulation. Let machines train machines. I’m excited to introduce DexMimicGen, a massive-scale synthetic data generator that enables a humanoid robot to learn complex skills from only a handful of human demonstrations. Yes, as few as 5! DexMimicGen addresses the biggest pain point in robotics: where do we get data? Unlike with LLMs, where vast amounts of texts are readily available, you cannot simply download motor control signals from the internet. So researchers teleoperate the robots to collect motion data via XR headsets. They have to repeat the same skill over and over and over again, because neural nets are data hungry. This is a very slow and uncomfortable process. At NVIDIA, we believe the majority of high-quality tokens for robot foundation models will come from simulation. What DexMimicGen does is to trade GPU compute time for human time. It takes one motion trajectory from human, and multiplies into 1000s of new trajectories. A robot brain trained on this augmented dataset will generalize far better in the real world. Think of DexMimicGen as a learning signal amplifier. It maps a small dataset to a large (de facto infinite) dataset, using physics simulation in the loop. In this way, we free humans from babysitting the bots all day. The future of robot data is generative. The future of the entire robot learning pipeline will also be generative. 🧵

Jim Fan

165,215 次观看 • 1 年前

Does GPT understand the world? Here is what Ilya Sutskever, co-founder of OpenAI, says during a discussion with Jensen Huang, CEO of Nvidia: (1) When we train a large neural network to accurately predict the next word in lots of different texts from the internet, the AI is learning a world model. (2) On the surface, it may look like learning correlations in text, but it turns out that to 'just learn' statistical correlations in text, to compress information really well, what the neural network learns is some representation of the process that produced the text. (3) This text is a projection of the world...what the neural network is learning is aspects of the world, of people, of the human conditions, their hopes, dreams, motivations, their interactions...the situations we are in. The neural network learns a compressed, abstract, usable representation." Do you think learning representations = understanding? Are large language models simply stochastic parrots, or are they much more?

Does GPT understand the world? Here is what Ilya Sutskever, co-founder of OpenAI, says during a discussion with Jensen Huang, CEO of Nvidia: (1) When we train a large neural network to accurately predict the next word in lots of different texts from the internet, the AI is learning a world model. (2) On the surface, it may look like learning correlations in text, but it turns out that to 'just learn' statistical correlations in text, to compress information really well, what the neural network learns is some representation of the process that produced the text. (3) This text is a projection of the world...what the neural network is learning is aspects of the world, of people, of the human conditions, their hopes, dreams, motivations, their interactions...the situations we are in. The neural network learns a compressed, abstract, usable representation." Do you think learning representations = understanding? Are large language models simply stochastic parrots, or are they much more?

Alex Ker 🔭

1,367,008 次观看 • 2 年前

In 1998, Japanese Mathematician, Engineer and Machine Learning pioneer Shun-ichi Amari published "Natural Gradient Works Efficiently in Learning (722 Cites in Papers and 9 Cites in Patents)" and made a point that still feels fresh even today If your loss is L(θ), then the usual update θ̇ = −∇ L(θ) quietly assumes your parameter space is flat. Amari asked what happens when the model itself has geometry, described by G(θ), the Fisher information. Then the more natural direction becomes θ̇ = −G(θ)⁻¹∇ L(θ). This is the same loss, but with a different notion of distance. As the animation shows, ordinary Gradient Descent follows the raw slope and gets dragged around by distortion, while the natural gradient moves in a way that respects the geometry of the statistical model itself.

In 1998, Japanese Mathematician, Engineer and Machine Learning pioneer Shun-ichi Amari published "Natural Gradient Works Efficiently in Learning (722 Cites in Papers and 9 Cites in Patents)" and made a point that still feels fresh even today If your loss is L(θ), then the usual update θ̇ = −∇ L(θ) quietly assumes your parameter space is flat. Amari asked what happens when the model itself has geometry, described by G(θ), the Fisher information. Then the more natural direction becomes θ̇ = −G(θ)⁻¹∇ L(θ). This is the same loss, but with a different notion of distance. As the animation shows, ordinary Gradient Descent follows the raw slope and gets dragged around by distortion, while the natural gradient moves in a way that respects the geometry of the statistical model itself.

Mathelirium

45,688 次观看 • 2 个月前

We spoke with Michael Mignano & Nir Zicherman about Oboe; a new AI-powered learning app that lets you build personalized courses on any topic with just a prompt. As Nir put it: "People do not learn in one path, and they don't learn with one format. So a core philosophy of ours is if you're going to build a learning platform, you have to learn it the way that people actually learn. And the way that people actually learn is multimodal."

We spoke with Michael Mignano & Nir Zicherman about Oboe; a new AI-powered learning app that lets you build personalized courses on any topic with just a prompt. As Nir put it: "People do not learn in one path, and they don't learn with one format. So a core philosophy of ours is if you're going to build a learning platform, you have to learn it the way that people actually learn. And the way that people actually learn is multimodal."

TBPN

17,903 次观看 • 9 个月前

I like how every time a version of Supporting Me appears in a mainline Sonic game, they put more emphasis on the Live and Learn melody in the background It slowly goes from something you barely notice to "yeah this is what this part of the song is"

I like how every time a version of Supporting Me appears in a mainline Sonic game, they put more emphasis on the Live and Learn melody in the background It slowly goes from something you barely notice to "yeah this is what this part of the song is"

Gibus

104,364 次观看 • 1 年前

I got a smart meter recently and saw you can download a HDF file with the data so I had the idea of writing a script that could parse that and show the data in a useful manner. However I discovered that someone has already done this and done a really good job on it. The video explains how it works. In simple terms it uses the ESB smart meter data and shows a breakdown of how much data you are using and when and also recommends plans and estimates what each one would cost based on the data. The tool is available at easier to read on Laptop or Tablet or then your phone to Landscape.

I got a smart meter recently and saw you can download a HDF file with the data so I had the idea of writing a script that could parse that and show the data in a useful manner. However I discovered that someone has already done this and done a really good job on it. The video explains how it works. In simple terms it uses the ESB smart meter data and shows a breakdown of how much data you are using and when and also recommends plans and estimates what each one would cost based on the data. The tool is available at easier to read on Laptop or Tablet or then your phone to Landscape.

Carlow Weather

298,504 次观看 • 2 年前

🚨David Friedberg: AI is starting to identify and solve problems on its own “I'll give you a science corner example: there's this Evo 2 model that they publish at the Arc Institute, which Patrick Collison, you know, is the main funder and chairman.” “So that Evo 2 model, they just ingested all the DNA data they could find in the world.” “Trillions and trillions of base paired data that they ingested and then they looked at patterns in DNA. And that's it.” “They had no context for what the DNA represented, they had no context for the concept of genes, none of the structured understanding of what that DNA does, what it is, and you know what it did?” “They fed in the BRCA gene variant and the thing output a warning saying, ‘I think that this is a pathogenic variant to DNA,’ without having any context.” “This is the breast cancer allele.” “And it didn't have any knowledge and it wasn't trained on that at all.” “It had no knowledge that there are pathogenic variants for cancer, and it identified that this was a genetic variant that can cause some sort of pathogenic outcome in the organism.” “That's a great example where there's a lack of understanding at the human level on what really drives some of the patterns in nature, the patterns in society, the patterns in behavior that are kind of emergent phenomena perhaps, that these AI models are starting to identify.”

🚨David Friedberg: AI is starting to identify and solve problems on its own “I'll give you a science corner example: there's this Evo 2 model that they publish at the Arc Institute, which Patrick Collison, you know, is the main funder and chairman.” “So that Evo 2 model, they just ingested all the DNA data they could find in the world.” “Trillions and trillions of base paired data that they ingested and then they looked at patterns in DNA. And that's it.” “They had no context for what the DNA represented, they had no context for the concept of genes, none of the structured understanding of what that DNA does, what it is, and you know what it did?” “They fed in the BRCA gene variant and the thing output a warning saying, ‘I think that this is a pathogenic variant to DNA,’ without having any context.” “This is the breast cancer allele.” “And it didn't have any knowledge and it wasn't trained on that at all.” “It had no knowledge that there are pathogenic variants for cancer, and it identified that this was a genetic variant that can cause some sort of pathogenic outcome in the organism.” “That's a great example where there's a lack of understanding at the human level on what really drives some of the patterns in nature, the patterns in society, the patterns in behavior that are kind of emergent phenomena perhaps, that these AI models are starting to identify.”

The All-In Podcast

79,691 次观看 • 10 个月前

The most interesting part for me is where Andrej Karpathy describes why LLMs aren't able to learn like humans. As you would expect, he comes up with a wonderfully evocative phrase to describe RL: “sucking supervision bits through a straw.” A single end reward gets broadcast across every token in a successful trajectory, upweighting even wrong or irrelevant turns that lead to the right answer. > “Humans don't use reinforcement learning, as I've said before. I think they do something different. Reinforcement learning is a lot worse than the average person thinks. Reinforcement learning is terrible. It just so happens that everything that we had before is much worse.” So what do humans do instead? > “The book I’m reading is a set of prompts for me to do synthetic data generation. It's by manipulating that information that you actually gain that knowledge. We have no equivalent of that with LLMs; they don't really do that.” > “I'd love to see during pretraining some kind of a stage where the model thinks through the material and tries to reconcile it with what it already knows. There's no equivalent of any of this. This is all research.” Why can’t we just add this training to LLMs today? > “There are very subtle, hard to understand reasons why it's not trivial. If I just give synthetic generation of the model thinking about a book, you look at it and you're like, 'This looks great. Why can't I train on it?' You could try, but the model will actually get much worse if you continue trying.” > “Say we have a chapter of a book and I ask an LLM to think about it. It will give you something that looks very reasonable. But if I ask it 10 times, you'll notice that all of them are the same.” > “You're not getting the richness and the diversity and the entropy from these models as you would get from humans. How do you get synthetic data generation to work despite the collapse and while maintaining the entropy? It is a research problem.” How do humans get around model collapse? > “These analogies are surprisingly good. Humans collapse during the course of their lives. Children haven't overfit yet. They will say stuff that will shock you. Because they're not yet collapsed. But we [adults] are collapsed. We end up revisiting the same thoughts, we end up saying more and more of the same stuff, the learning rates go down, the collapse continues to get worse, and then everything deteriorates.” In fact, there’s an interesting paper arguing that dreaming evolved to assist generalization, and resist overfitting to daily learning - look up The Overfitted Brain by Erik Hoel. I asked Karpathy: Isn’t it interesting that humans learn best at a part of their lives (childhood) whose actual details they completely forget, adults still learn really well but have terrible memory about the particulars of the things they read or watch, and LLMs can memorize arbitrary details about text that no human could but are currently pretty bad at generalization? > “[Fallible human memory] is a feature, not a bug, because it forces you to only learn the generalizable components. LLMs are distracted by all the memory that they have of the pre-trained documents. That's why when I talk about the cognitive core, I actually want to remove the memory. I'd love to have them have less memory so that they have to look things up and they only maintain the algorithms for thought, and the idea of an experiment, and all this cognitive glue for acting.”

The most interesting part for me is where Andrej Karpathy describes why LLMs aren't able to learn like humans. As you would expect, he comes up with a wonderfully evocative phrase to describe RL: “sucking supervision bits through a straw.” A single end reward gets broadcast across every token in a successful trajectory, upweighting even wrong or irrelevant turns that lead to the right answer. > “Humans don't use reinforcement learning, as I've said before. I think they do something different. Reinforcement learning is a lot worse than the average person thinks. Reinforcement learning is terrible. It just so happens that everything that we had before is much worse.” So what do humans do instead? > “The book I’m reading is a set of prompts for me to do synthetic data generation. It's by manipulating that information that you actually gain that knowledge. We have no equivalent of that with LLMs; they don't really do that.” > “I'd love to see during pretraining some kind of a stage where the model thinks through the material and tries to reconcile it with what it already knows. There's no equivalent of any of this. This is all research.” Why can’t we just add this training to LLMs today? > “There are very subtle, hard to understand reasons why it's not trivial. If I just give synthetic generation of the model thinking about a book, you look at it and you're like, 'This looks great. Why can't I train on it?' You could try, but the model will actually get much worse if you continue trying.” > “Say we have a chapter of a book and I ask an LLM to think about it. It will give you something that looks very reasonable. But if I ask it 10 times, you'll notice that all of them are the same.” > “You're not getting the richness and the diversity and the entropy from these models as you would get from humans. How do you get synthetic data generation to work despite the collapse and while maintaining the entropy? It is a research problem.” How do humans get around model collapse? > “These analogies are surprisingly good. Humans collapse during the course of their lives. Children haven't overfit yet. They will say stuff that will shock you. Because they're not yet collapsed. But we [adults] are collapsed. We end up revisiting the same thoughts, we end up saying more and more of the same stuff, the learning rates go down, the collapse continues to get worse, and then everything deteriorates.” In fact, there’s an interesting paper arguing that dreaming evolved to assist generalization, and resist overfitting to daily learning - look up The Overfitted Brain by Erik Hoel. I asked Karpathy: Isn’t it interesting that humans learn best at a part of their lives (childhood) whose actual details they completely forget, adults still learn really well but have terrible memory about the particulars of the things they read or watch, and LLMs can memorize arbitrary details about text that no human could but are currently pretty bad at generalization? > “[Fallible human memory] is a feature, not a bug, because it forces you to only learn the generalizable components. LLMs are distracted by all the memory that they have of the pre-trained documents. That's why when I talk about the cognitive core, I actually want to remove the memory. I'd love to have them have less memory so that they have to look things up and they only maintain the algorithms for thought, and the idea of an experiment, and all this cognitive glue for acting.”

Dwarkesh Patel

1,049,820 次观看 • 7 个月前

We just launched a major new Data Engineering Professional Certificate on Coursera! Data underlies all modern AI systems, and engineers who know how to build systems to store and serve it are in high demand. If you're interested in learning this skill, please check out this 4-course sequence, which is designed to make you job-ready to be a Data Engineer. This is a new specialization taught by Joe Reis, the co-author of the best-selling book “Fundamentals of Data Engineering," in collaboration with AWS. (Disclosure, I serve on Amazon's board.) For many AI systems, data engineering is 80% of the work, and modeling is 20%. But people’s attention on these two topics is often flipped. This makes the job of the data engineer particularly important. In this professional certificate, you'll learn foundational data engineering skills while implementing modern data architectures using open-source tools: - Learn the key steps of the data lifecycle, to generate, ingest, store, transform, and serve data. - Learn to align with organizational goals to design the data pipeline right for your business' needs. - Understand how to make necessary trade-offs between speed, scalability, security, and cost. Joe has distilled into this specialization decades of experience helping startups and large companies with data infrastructure. He is also joined by 17 other industry leaders in the data field, who will help you learn in-demand skills for the growing field of data engineering. Please sign up here:

We just launched a major new Data Engineering Professional Certificate on Coursera! Data underlies all modern AI systems, and engineers who know how to build systems to store and serve it are in high demand. If you're interested in learning this skill, please check out this 4-course sequence, which is designed to make you job-ready to be a Data Engineer. This is a new specialization taught by Joe Reis, the co-author of the best-selling book “Fundamentals of Data Engineering," in collaboration with AWS. (Disclosure, I serve on Amazon's board.) For many AI systems, data engineering is 80% of the work, and modeling is 20%. But people’s attention on these two topics is often flipped. This makes the job of the data engineer particularly important. In this professional certificate, you'll learn foundational data engineering skills while implementing modern data architectures using open-source tools: - Learn the key steps of the data lifecycle, to generate, ingest, store, transform, and serve data. - Learn to align with organizational goals to design the data pipeline right for your business' needs. - Understand how to make necessary trade-offs between speed, scalability, security, and cost. Joe has distilled into this specialization decades of experience helping startups and large companies with data infrastructure. He is also joined by 17 other industry leaders in the data field, who will help you learn in-demand skills for the growing field of data engineering. Please sign up here:

Andrew Ng

118,937 次观看 • 1 年前

CEO OF GOOGLE’S DEEPMIND: IT LEARNS LIKE A HUMAN BEING WOULD LEARN Demis Hassabis: “Theories about what kinds of capabilities these systems will have, that’s obviously what we try to build into the architectures. But at the end of the day, how it learns, what it picks up from the data, is part of the training of these systems. We don’t program that in. It learns like a human being would learn. So new capabilities or properties can emerge from that training situation.” Source: CBSNews

CEO OF GOOGLE’S DEEPMIND: IT LEARNS LIKE A HUMAN BEING WOULD LEARN Demis Hassabis: “Theories about what kinds of capabilities these systems will have, that’s obviously what we try to build into the architectures. But at the end of the day, how it learns, what it picks up from the data, is part of the training of these systems. We don’t program that in. It learns like a human being would learn. So new capabilities or properties can emerge from that training situation.” Source: CBSNews

Mario Nawfal

73,604 次观看 • 1 年前

"When I last saw my friend Charlie Kirk a couple of weeks ago, we were recounting how the Constitutional Convention had ended in 1787. Benjamin Franklin was walking out of the hall when a woman, Elizabeth Powell, the wife of the Mayor of Philadelphia, asked him, “Well, Doctor, what do we have? A republic or a monarchy?” Franklin turned to her and said, “A republic, if you can keep it.” If you can keep it. Charlie and I thought about every single one of those words. This is a turning point—if we can keep it. This is a turning point—if we wake up every single day and praise our Lord and Savior. This is a turning point—if we remember that in Charlie Kirk’s America, it is not left versus right. It is right versus wrong. Good versus evil. And we must stand up to that evil. It is the evil that celebrates assassination. It is the evil that glorifies wickedness. It is the evil that props up forces that seek to divide us. That is the evil we must stand against. The internet is powerful, but at times—and in this time especially, in the wake of Charlie’s murder—it feels like some corners of the internet are acting like Radio Rwanda: trying to divide us, spreading lies, tearing apart our national fabric. We must not let them win. And by the size of this crowd tonight—we will win."

"When I last saw my friend Charlie Kirk a couple of weeks ago, we were recounting how the Constitutional Convention had ended in 1787. Benjamin Franklin was walking out of the hall when a woman, Elizabeth Powell, the wife of the Mayor of Philadelphia, asked him, “Well, Doctor, what do we have? A republic or a monarchy?” Franklin turned to her and said, “A republic, if you can keep it.” If you can keep it. Charlie and I thought about every single one of those words. This is a turning point—if we can keep it. This is a turning point—if we wake up every single day and praise our Lord and Savior. This is a turning point—if we remember that in Charlie Kirk’s America, it is not left versus right. It is right versus wrong. Good versus evil. And we must stand up to that evil. It is the evil that celebrates assassination. It is the evil that glorifies wickedness. It is the evil that props up forces that seek to divide us. That is the evil we must stand against. The internet is powerful, but at times—and in this time especially, in the wake of Charlie’s murder—it feels like some corners of the internet are acting like Radio Rwanda: trying to divide us, spreading lies, tearing apart our national fabric. We must not let them win. And by the size of this crowd tonight—we will win."

James Fishback

208,541 次观看 • 8 个月前