Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

The boundary between trainable and untrainable neural network hyperparameter configurations is fractal! And beautiful! Here is a grid search over a different pair of hyperparameters -- this time learning rate and the mean of the parameter initialization distribution.

Jascha Sohl-Dickstein

30,571 subscribers

250,667 просмотров • 2 лет назад •via X (Twitter)

Игры Наука и технологии Образование

Anya Rossi• Live Now

Private livecam show

Комментарии: 10

Фото профиля Jascha Sohl-Dickstein

Jascha Sohl-Dickstein2 лет назад

Have you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.

Фото профиля Jascha Sohl-Dickstein

Jascha Sohl-Dickstein2 лет назад

There are similarities between the way in which many fractals are generated, and the way in which we train neural networks. Both involve repeatedly applying a function to its own output. In both cases, that function has hyperparameters that control its behavior.

Фото профиля Jascha Sohl-Dickstein

Jascha Sohl-Dickstein2 лет назад

In both cases the function iteration can produce outputs that either diverge to infinity or remain happily bounded depending on those hyperparameters. Fractals are often defined by the boundary between hyperparameters where function iteration diverges or remains bounded.

Фото профиля Jascha Sohl-Dickstein

Jascha Sohl-Dickstein2 лет назад

So it shouldn't (post-hoc) be a surprise that hyperparameter landscapes are fractal. This is a general phenomenon: in these panes we see fractal hyperparameter landscapes for every neural network configuration I tried, including deep linear networks.

Фото профиля Jascha Sohl-Dickstein

Jascha Sohl-Dickstein2 лет назад

The best performing hyperparameters are typically at the edge of stability -- so when you optimize neural network hyperparameters, you are contending with hyperparameter landscapes that look like this.

Фото профиля Jascha Sohl-Dickstein

Jascha Sohl-Dickstein2 лет назад

Want to learn more? Blog post: 3-page paper:

Фото профиля Jascha Sohl-Dickstein

Jascha Sohl-Dickstein2 лет назад

I don't have a SoundCloud, but I did join Anthropic last week, and so far it has exceeded my (high) expectations. I would strongly recommend working there (and using Claude). *this project not done at Anthropic -- this was recreational machine learning on my own time.

Фото профиля Kosta Derpanis

Kosta Derpanis2 лет назад

Just in time to make the cut for my lecture today. At 45 sec mark. Thanks for sharing!

Фото профиля Mihoda

Mihoda2 лет назад

I'm not sure what I'm looking at, but my guess at interpretation would be instability.

Фото профиля Kenneth Shinozuka

Kenneth Shinozuka2 лет назад

beautiful result

Похожие видео

Does GPT understand the world? Here is what Ilya Sutskever, co-founder of OpenAI, says during a discussion with Jensen Huang, CEO of Nvidia: (1) When we train a large neural network to accurately predict the next word in lots of different texts from the internet, the AI is learning a world model. (2) On the surface, it may look like learning correlations in text, but it turns out that to 'just learn' statistical correlations in text, to compress information really well, what the neural network learns is some representation of the process that produced the text. (3) This text is a projection of the world...what the neural network is learning is aspects of the world, of people, of the human conditions, their hopes, dreams, motivations, their interactions...the situations we are in. The neural network learns a compressed, abstract, usable representation." Do you think learning representations = understanding? Are large language models simply stochastic parrots, or are they much more?

Does GPT understand the world? Here is what Ilya Sutskever, co-founder of OpenAI, says during a discussion with Jensen Huang, CEO of Nvidia: (1) When we train a large neural network to accurately predict the next word in lots of different texts from the internet, the AI is learning a world model. (2) On the surface, it may look like learning correlations in text, but it turns out that to 'just learn' statistical correlations in text, to compress information really well, what the neural network learns is some representation of the process that produced the text. (3) This text is a projection of the world...what the neural network is learning is aspects of the world, of people, of the human conditions, their hopes, dreams, motivations, their interactions...the situations we are in. The neural network learns a compressed, abstract, usable representation." Do you think learning representations = understanding? Are large language models simply stochastic parrots, or are they much more?

Alex Ker 🔭

1,367,077 просмотров • 2 лет назад

here is a montage about a decade of doing the same thing over and over again and expecting different results

here is a montage about a decade of doing the same thing over and over again and expecting different results

Justin McElroy

206,972 просмотров • 3 лет назад

This is the time of meeting between a farmer and chickens.

This is the time of meeting between a farmer and chickens.

Worldly

83,456 просмотров • 1 год назад

Puffin 'billing' is an act of pair bonding between a pair of Puffins and can be frequently seen especially during the start of the seabird season in March and April.

Puffin 'billing' is an act of pair bonding between a pair of Puffins and can be frequently seen especially during the start of the seabird season in March and April.

David Steel

10,374 просмотров • 11 месяцев назад

Strengthening the relationships between creators and fans by cutting out the middlemen. This is the future of creative distribution.

Strengthening the relationships between creators and fans by cutting out the middlemen. This is the future of creative distribution.

KAMI

14,346 просмотров • 4 месяцев назад

The age of Neural Engineering is here... and Max Hodak's Science Corp is on the frontier: ‣ Curing blindness ‣ Merging mind & machine with BCIs ‣ Today, revealing a new kind of neural probe ‣ And exploring consciousness itself on Episode 65 of S3.

The age of Neural Engineering is here... and Max Hodak's Science Corp is on the frontier: ‣ Curing blindness ‣ Merging mind & machine with BCIs ‣ Today, revealing a new kind of neural probe ‣ And exploring consciousness itself on Episode 65 of S3.

Jason Carman

132,932 просмотров • 1 год назад

What do they mean and what is the different between them?

What do they mean and what is the different between them?

Interesting AF

29,615 просмотров • 5 месяцев назад

This is not just a Political victory. This is a revolution by the people of West Bengal. This is no longer just a contest between two parties. It is the victory of Maa Mati Manush over the TMC and its forces of destruction and corruption.

This is not just a Political victory. This is a revolution by the people of West Bengal. This is no longer just a contest between two parties. It is the victory of Maa Mati Manush over the TMC and its forces of destruction and corruption.

BJP West Bengal

46,626 просмотров • 2 месяцев назад

Ever wondered what neural networks are and how they work? Systems like ChatGPT use neural networks to work as well as they do. Neural networks are composed of "layers" of neurons, layers with different functions; connections between layers called "weights"; and mathematical functions called "activation functions". If you’re interested in learning about these systems, check the comments. Ultimately, the neural network structure of the model serves to visually demonstrate that it is, in fact, a complex mathematical equation. When companies release the model's weights, they are releasing a key component needed to run the model's complete equation. Without the weights, the equation is incomplete. For the math-minded: the weights of a model are the learned numbers (they are variables during training) that are then used as constants in the mathematical functions that make up the model. Neural networks are ultimately just one big, hyper-complex mathematical function, and when a model is trained, it learns the constants associated with the high-dimensional input.

Ever wondered what neural networks are and how they work? Systems like ChatGPT use neural networks to work as well as they do. Neural networks are composed of "layers" of neurons, layers with different functions; connections between layers called "weights"; and mathematical functions called "activation functions". If you’re interested in learning about these systems, check the comments. Ultimately, the neural network structure of the model serves to visually demonstrate that it is, in fact, a complex mathematical equation. When companies release the model's weights, they are releasing a key component needed to run the model's complete equation. Without the weights, the equation is incomplete. For the math-minded: the weights of a model are the learned numbers (they are variables during training) that are then used as constants in the mathematical functions that make up the model. Neural networks are ultimately just one big, hyper-complex mathematical function, and when a model is trained, it learns the constants associated with the high-dimensional input.

Harper Carroll

29,648 просмотров • 8 месяцев назад

Individual Neuron: Neural Network A neuron in a neural network performs a weighted sum of inputs, adds a bias, and applies an activation function like sigmoid, ReLU, or tanh, introducing non-linearity. This output helps the neuron learn and represent patterns in the data.

Individual Neuron: Neural Network A neuron in a neural network performs a weighted sum of inputs, adds a bias, and applies an activation function like sigmoid, ReLU, or tanh, introducing non-linearity. This output helps the neuron learn and represent patterns in the data.

ₕₐₘₚₜₒₙ — e/acc

47,467 просмотров • 2 лет назад

Horse, pony, and all the in between! The smooth front variation is now in. This pair here are a size large example test

Horse, pony, and all the in between! The smooth front variation is now in. This pair here are a size large example test

Biglionsden 🦁

41,189 просмотров • 8 месяцев назад

Eric Cantona : " If Gaza is over this time the world is inevitably over. The world developed its way of murdering , its greediness grew and its desecration of innocents has extended . If Gaza doesn't make it this time this mean that victory is for the most violent , and the most barbaric "

Eric Cantona : " If Gaza is over this time the world is inevitably over. The world developed its way of murdering , its greediness grew and its desecration of innocents has extended . If Gaza doesn't make it this time this mean that victory is for the most violent , and the most barbaric "

Irlandarra

684,677 просмотров • 1 год назад

The definition of insanity is doing the same thing over and over while expecting a different result.

The definition of insanity is doing the same thing over and over while expecting a different result.

Toby Doeden

82,793 просмотров • 7 месяцев назад

Watch Spot crouch, jump, climb boxes and leap across gaps, controlled by a neural network trained with reinforcement learning (RL) and multi-expert distillation. Multiple expert policies were trained and distilled together into a single policy that was fine tuned to improve performance over diverse terrains. This work was inspired by ANYmal’s parkour capabilities. The neural network processes depth data from Spot's sensors to construct an understanding of the environment.

Watch Spot crouch, jump, climb boxes and leap across gaps, controlled by a neural network trained with reinforcement learning (RL) and multi-expert distillation. Multiple expert policies were trained and distilled together into a single policy that was fine tuned to improve performance over diverse terrains. This work was inspired by ANYmal’s parkour capabilities. The neural network processes depth data from Spot's sensors to construct an understanding of the environment.

RAI Institute

14,768 просмотров • 2 месяцев назад

The start of the Prayer Service at the Sistine Chapel. This is the first time a British monarch and a Pope have prayed together here since 855. Charles is here as King and Head of the Church of England.

The start of the Prayer Service at the Sistine Chapel. This is the first time a British monarch and a Pope have prayed together here since 855. Charles is here as King and Head of the Church of England.

Raymond Arroyo

17,805 просмотров • 9 месяцев назад