正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

The boundary between trainable and untrainable neural network hyperparameter configurations is fractal! And beautiful! Here is a grid search over a different pair of hyperparameters -- this time learning rate and the mean of the parameter initialization distribution.

Jascha Sohl-Dickstein

30,571 subscribers

250,667 次观看 • 2 年前 •via X (Twitter)

游戏科学技术教育

Anya Rossi• Live Now

Private livecam show

10 条评论

Jascha Sohl-Dickstein 的头像

Jascha Sohl-Dickstein2 年前

Have you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.

Jascha Sohl-Dickstein 的头像

Jascha Sohl-Dickstein2 年前

There are similarities between the way in which many fractals are generated, and the way in which we train neural networks. Both involve repeatedly applying a function to its own output. In both cases, that function has hyperparameters that control its behavior.

Jascha Sohl-Dickstein 的头像

Jascha Sohl-Dickstein2 年前

In both cases the function iteration can produce outputs that either diverge to infinity or remain happily bounded depending on those hyperparameters. Fractals are often defined by the boundary between hyperparameters where function iteration diverges or remains bounded.

Jascha Sohl-Dickstein 的头像

Jascha Sohl-Dickstein2 年前

So it shouldn't (post-hoc) be a surprise that hyperparameter landscapes are fractal. This is a general phenomenon: in these panes we see fractal hyperparameter landscapes for every neural network configuration I tried, including deep linear networks.

Jascha Sohl-Dickstein 的头像

Jascha Sohl-Dickstein2 年前

The best performing hyperparameters are typically at the edge of stability -- so when you optimize neural network hyperparameters, you are contending with hyperparameter landscapes that look like this.

Jascha Sohl-Dickstein 的头像

Jascha Sohl-Dickstein2 年前

Want to learn more? Blog post: 3-page paper:

Jascha Sohl-Dickstein 的头像

Jascha Sohl-Dickstein2 年前

I don't have a SoundCloud, but I did join Anthropic last week, and so far it has exceeded my (high) expectations. I would strongly recommend working there (and using Claude). *this project not done at Anthropic -- this was recreational machine learning on my own time.

Kosta Derpanis 的头像

Kosta Derpanis2 年前

Just in time to make the cut for my lecture today. At 45 sec mark. Thanks for sharing!

Mihoda 的头像

Mihoda2 年前

I'm not sure what I'm looking at, but my guess at interpretation would be instability.

Kenneth Shinozuka 的头像

Kenneth Shinozuka2 年前

beautiful result

相关视频

Does GPT understand the world? Here is what Ilya Sutskever, co-founder of OpenAI, says during a discussion with Jensen Huang, CEO of Nvidia: (1) When we train a large neural network to accurately predict the next word in lots of different texts from the internet, the AI is learning a world model. (2) On the surface, it may look like learning correlations in text, but it turns out that to 'just learn' statistical correlations in text, to compress information really well, what the neural network learns is some representation of the process that produced the text. (3) This text is a projection of the world...what the neural network is learning is aspects of the world, of people, of the human conditions, their hopes, dreams, motivations, their interactions...the situations we are in. The neural network learns a compressed, abstract, usable representation." Do you think learning representations = understanding? Are large language models simply stochastic parrots, or are they much more?

Does GPT understand the world? Here is what Ilya Sutskever, co-founder of OpenAI, says during a discussion with Jensen Huang, CEO of Nvidia: (1) When we train a large neural network to accurately predict the next word in lots of different texts from the internet, the AI is learning a world model. (2) On the surface, it may look like learning correlations in text, but it turns out that to 'just learn' statistical correlations in text, to compress information really well, what the neural network learns is some representation of the process that produced the text. (3) This text is a projection of the world...what the neural network is learning is aspects of the world, of people, of the human conditions, their hopes, dreams, motivations, their interactions...the situations we are in. The neural network learns a compressed, abstract, usable representation." Do you think learning representations = understanding? Are large language models simply stochastic parrots, or are they much more?

Alex Ker 🔭

1,367,077 次观看 • 2 年前

here is a montage about a decade of doing the same thing over and over again and expecting different results

here is a montage about a decade of doing the same thing over and over again and expecting different results

Justin McElroy

206,972 次观看 • 3 年前

This is the time of meeting between a farmer and chickens.

This is the time of meeting between a farmer and chickens.

Worldly

83,456 次观看 • 1 年前

Puffin 'billing' is an act of pair bonding between a pair of Puffins and can be frequently seen especially during the start of the seabird season in March and April.

Puffin 'billing' is an act of pair bonding between a pair of Puffins and can be frequently seen especially during the start of the seabird season in March and April.

David Steel

10,374 次观看 • 11 个月前

Strengthening the relationships between creators and fans by cutting out the middlemen. This is the future of creative distribution.

Strengthening the relationships between creators and fans by cutting out the middlemen. This is the future of creative distribution.

KAMI

14,346 次观看 • 4 个月前

The age of Neural Engineering is here... and Max Hodak's Science Corp is on the frontier: ‣ Curing blindness ‣ Merging mind & machine with BCIs ‣ Today, revealing a new kind of neural probe ‣ And exploring consciousness itself on Episode 65 of S3.

The age of Neural Engineering is here... and Max Hodak's Science Corp is on the frontier: ‣ Curing blindness ‣ Merging mind & machine with BCIs ‣ Today, revealing a new kind of neural probe ‣ And exploring consciousness itself on Episode 65 of S3.

Jason Carman

132,932 次观看 • 1 年前

What do they mean and what is the different between them?

What do they mean and what is the different between them?

Interesting AF

29,615 次观看 • 5 个月前

This is not just a Political victory. This is a revolution by the people of West Bengal. This is no longer just a contest between two parties. It is the victory of Maa Mati Manush over the TMC and its forces of destruction and corruption.

This is not just a Political victory. This is a revolution by the people of West Bengal. This is no longer just a contest between two parties. It is the victory of Maa Mati Manush over the TMC and its forces of destruction and corruption.

BJP West Bengal

46,626 次观看 • 2 个月前

Ever wondered what neural networks are and how they work? Systems like ChatGPT use neural networks to work as well as they do. Neural networks are composed of "layers" of neurons, layers with different functions; connections between layers called "weights"; and mathematical functions called "activation functions". If you’re interested in learning about these systems, check the comments. Ultimately, the neural network structure of the model serves to visually demonstrate that it is, in fact, a complex mathematical equation. When companies release the model's weights, they are releasing a key component needed to run the model's complete equation. Without the weights, the equation is incomplete. For the math-minded: the weights of a model are the learned numbers (they are variables during training) that are then used as constants in the mathematical functions that make up the model. Neural networks are ultimately just one big, hyper-complex mathematical function, and when a model is trained, it learns the constants associated with the high-dimensional input.

Ever wondered what neural networks are and how they work? Systems like ChatGPT use neural networks to work as well as they do. Neural networks are composed of "layers" of neurons, layers with different functions; connections between layers called "weights"; and mathematical functions called "activation functions". If you’re interested in learning about these systems, check the comments. Ultimately, the neural network structure of the model serves to visually demonstrate that it is, in fact, a complex mathematical equation. When companies release the model's weights, they are releasing a key component needed to run the model's complete equation. Without the weights, the equation is incomplete. For the math-minded: the weights of a model are the learned numbers (they are variables during training) that are then used as constants in the mathematical functions that make up the model. Neural networks are ultimately just one big, hyper-complex mathematical function, and when a model is trained, it learns the constants associated with the high-dimensional input.

Harper Carroll

29,648 次观看 • 8 个月前

Individual Neuron: Neural Network A neuron in a neural network performs a weighted sum of inputs, adds a bias, and applies an activation function like sigmoid, ReLU, or tanh, introducing non-linearity. This output helps the neuron learn and represent patterns in the data.

Individual Neuron: Neural Network A neuron in a neural network performs a weighted sum of inputs, adds a bias, and applies an activation function like sigmoid, ReLU, or tanh, introducing non-linearity. This output helps the neuron learn and represent patterns in the data.

ₕₐₘₚₜₒₙ — e/acc

47,467 次观看 • 2 年前

Horse, pony, and all the in between! The smooth front variation is now in. This pair here are a size large example test

Horse, pony, and all the in between! The smooth front variation is now in. This pair here are a size large example test

Biglionsden 🦁

41,189 次观看 • 8 个月前

Eric Cantona : " If Gaza is over this time the world is inevitably over. The world developed its way of murdering , its greediness grew and its desecration of innocents has extended . If Gaza doesn't make it this time this mean that victory is for the most violent , and the most barbaric "

Eric Cantona : " If Gaza is over this time the world is inevitably over. The world developed its way of murdering , its greediness grew and its desecration of innocents has extended . If Gaza doesn't make it this time this mean that victory is for the most violent , and the most barbaric "

Irlandarra

684,677 次观看 • 1 年前

Watch Spot crouch, jump, climb boxes and leap across gaps, controlled by a neural network trained with reinforcement learning (RL) and multi-expert distillation. Multiple expert policies were trained and distilled together into a single policy that was fine tuned to improve performance over diverse terrains. This work was inspired by ANYmal’s parkour capabilities. The neural network processes depth data from Spot's sensors to construct an understanding of the environment.

Watch Spot crouch, jump, climb boxes and leap across gaps, controlled by a neural network trained with reinforcement learning (RL) and multi-expert distillation. Multiple expert policies were trained and distilled together into a single policy that was fine tuned to improve performance over diverse terrains. This work was inspired by ANYmal’s parkour capabilities. The neural network processes depth data from Spot's sensors to construct an understanding of the environment.

RAI Institute

14,768 次观看 • 2 个月前

The definition of insanity is doing the same thing over and over while expecting a different result.

The definition of insanity is doing the same thing over and over while expecting a different result.

Toby Doeden

82,793 次观看 • 7 个月前

The start of the Prayer Service at the Sistine Chapel. This is the first time a British monarch and a Pope have prayed together here since 855. Charles is here as King and Head of the Church of England.

The start of the Prayer Service at the Sistine Chapel. This is the first time a British monarch and a Pope have prayed together here since 855. Charles is here as King and Head of the Church of England.

Raymond Arroyo

17,805 次观看 • 9 个月前