Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

The boundary between trainable and untrainable neural network hyperparameter configurations is fractal! And beautiful! Here is a grid search over a different pair of hyperparameters -- this time learning rate and the mean of the parameter initialization distribution.

Jascha Sohl-Dickstein

30,571 subscribers

250,667 Aufrufe • vor 2 Jahren •via X (Twitter)

Gaming Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Jascha Sohl-Dickstein

Jascha Sohl-Dicksteinvor 2 Jahren

Have you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.

Profilbild von Jascha Sohl-Dickstein

Jascha Sohl-Dicksteinvor 2 Jahren

There are similarities between the way in which many fractals are generated, and the way in which we train neural networks. Both involve repeatedly applying a function to its own output. In both cases, that function has hyperparameters that control its behavior.

Profilbild von Jascha Sohl-Dickstein

Jascha Sohl-Dicksteinvor 2 Jahren

In both cases the function iteration can produce outputs that either diverge to infinity or remain happily bounded depending on those hyperparameters. Fractals are often defined by the boundary between hyperparameters where function iteration diverges or remains bounded.

Profilbild von Jascha Sohl-Dickstein

Jascha Sohl-Dicksteinvor 2 Jahren

So it shouldn't (post-hoc) be a surprise that hyperparameter landscapes are fractal. This is a general phenomenon: in these panes we see fractal hyperparameter landscapes for every neural network configuration I tried, including deep linear networks.

Profilbild von Jascha Sohl-Dickstein

Jascha Sohl-Dicksteinvor 2 Jahren

The best performing hyperparameters are typically at the edge of stability -- so when you optimize neural network hyperparameters, you are contending with hyperparameter landscapes that look like this.

Profilbild von Jascha Sohl-Dickstein

Jascha Sohl-Dicksteinvor 2 Jahren

Want to learn more? Blog post: 3-page paper:

Profilbild von Jascha Sohl-Dickstein

Jascha Sohl-Dicksteinvor 2 Jahren

I don't have a SoundCloud, but I did join Anthropic last week, and so far it has exceeded my (high) expectations. I would strongly recommend working there (and using Claude). *this project not done at Anthropic -- this was recreational machine learning on my own time.

Profilbild von Kosta Derpanis

Kosta Derpanisvor 2 Jahren

Just in time to make the cut for my lecture today. At 45 sec mark. Thanks for sharing!

Profilbild von Mihoda

Mihodavor 2 Jahren

I'm not sure what I'm looking at, but my guess at interpretation would be instability.

Profilbild von Kenneth Shinozuka

Kenneth Shinozukavor 2 Jahren

beautiful result

Ähnliche Videos

Does GPT understand the world? Here is what Ilya Sutskever, co-founder of OpenAI, says during a discussion with Jensen Huang, CEO of Nvidia: (1) When we train a large neural network to accurately predict the next word in lots of different texts from the internet, the AI is learning a world model. (2) On the surface, it may look like learning correlations in text, but it turns out that to 'just learn' statistical correlations in text, to compress information really well, what the neural network learns is some representation of the process that produced the text. (3) This text is a projection of the world...what the neural network is learning is aspects of the world, of people, of the human conditions, their hopes, dreams, motivations, their interactions...the situations we are in. The neural network learns a compressed, abstract, usable representation." Do you think learning representations = understanding? Are large language models simply stochastic parrots, or are they much more?

Does GPT understand the world? Here is what Ilya Sutskever, co-founder of OpenAI, says during a discussion with Jensen Huang, CEO of Nvidia: (1) When we train a large neural network to accurately predict the next word in lots of different texts from the internet, the AI is learning a world model. (2) On the surface, it may look like learning correlations in text, but it turns out that to 'just learn' statistical correlations in text, to compress information really well, what the neural network learns is some representation of the process that produced the text. (3) This text is a projection of the world...what the neural network is learning is aspects of the world, of people, of the human conditions, their hopes, dreams, motivations, their interactions...the situations we are in. The neural network learns a compressed, abstract, usable representation." Do you think learning representations = understanding? Are large language models simply stochastic parrots, or are they much more?

Alex Ker 🔭

1,367,077 Aufrufe • vor 2 Jahren

here is a montage about a decade of doing the same thing over and over again and expecting different results

here is a montage about a decade of doing the same thing over and over again and expecting different results

Justin McElroy

206,972 Aufrufe • vor 3 Jahren

This is the time of meeting between a farmer and chickens.

This is the time of meeting between a farmer and chickens.

Worldly

83,456 Aufrufe • vor 1 Jahr

Puffin 'billing' is an act of pair bonding between a pair of Puffins and can be frequently seen especially during the start of the seabird season in March and April.

Puffin 'billing' is an act of pair bonding between a pair of Puffins and can be frequently seen especially during the start of the seabird season in March and April.

David Steel

10,374 Aufrufe • vor 11 Monaten

Strengthening the relationships between creators and fans by cutting out the middlemen. This is the future of creative distribution.

Strengthening the relationships between creators and fans by cutting out the middlemen. This is the future of creative distribution.

KAMI

14,346 Aufrufe • vor 4 Monaten

The age of Neural Engineering is here... and Max Hodak's Science Corp is on the frontier: ‣ Curing blindness ‣ Merging mind & machine with BCIs ‣ Today, revealing a new kind of neural probe ‣ And exploring consciousness itself on Episode 65 of S3.

The age of Neural Engineering is here... and Max Hodak's Science Corp is on the frontier: ‣ Curing blindness ‣ Merging mind & machine with BCIs ‣ Today, revealing a new kind of neural probe ‣ And exploring consciousness itself on Episode 65 of S3.

Jason Carman

132,932 Aufrufe • vor 1 Jahr

What do they mean and what is the different between them?

What do they mean and what is the different between them?

Interesting AF

29,615 Aufrufe • vor 5 Monaten

This is not just a Political victory. This is a revolution by the people of West Bengal. This is no longer just a contest between two parties. It is the victory of Maa Mati Manush over the TMC and its forces of destruction and corruption.

This is not just a Political victory. This is a revolution by the people of West Bengal. This is no longer just a contest between two parties. It is the victory of Maa Mati Manush over the TMC and its forces of destruction and corruption.

BJP West Bengal

46,626 Aufrufe • vor 2 Monaten

Ever wondered what neural networks are and how they work? Systems like ChatGPT use neural networks to work as well as they do. Neural networks are composed of "layers" of neurons, layers with different functions; connections between layers called "weights"; and mathematical functions called "activation functions". If you’re interested in learning about these systems, check the comments. Ultimately, the neural network structure of the model serves to visually demonstrate that it is, in fact, a complex mathematical equation. When companies release the model's weights, they are releasing a key component needed to run the model's complete equation. Without the weights, the equation is incomplete. For the math-minded: the weights of a model are the learned numbers (they are variables during training) that are then used as constants in the mathematical functions that make up the model. Neural networks are ultimately just one big, hyper-complex mathematical function, and when a model is trained, it learns the constants associated with the high-dimensional input.

Ever wondered what neural networks are and how they work? Systems like ChatGPT use neural networks to work as well as they do. Neural networks are composed of "layers" of neurons, layers with different functions; connections between layers called "weights"; and mathematical functions called "activation functions". If you’re interested in learning about these systems, check the comments. Ultimately, the neural network structure of the model serves to visually demonstrate that it is, in fact, a complex mathematical equation. When companies release the model's weights, they are releasing a key component needed to run the model's complete equation. Without the weights, the equation is incomplete. For the math-minded: the weights of a model are the learned numbers (they are variables during training) that are then used as constants in the mathematical functions that make up the model. Neural networks are ultimately just one big, hyper-complex mathematical function, and when a model is trained, it learns the constants associated with the high-dimensional input.

Harper Carroll

29,648 Aufrufe • vor 8 Monaten

Individual Neuron: Neural Network A neuron in a neural network performs a weighted sum of inputs, adds a bias, and applies an activation function like sigmoid, ReLU, or tanh, introducing non-linearity. This output helps the neuron learn and represent patterns in the data.

Individual Neuron: Neural Network A neuron in a neural network performs a weighted sum of inputs, adds a bias, and applies an activation function like sigmoid, ReLU, or tanh, introducing non-linearity. This output helps the neuron learn and represent patterns in the data.

ₕₐₘₚₜₒₙ — e/acc

47,467 Aufrufe • vor 2 Jahren

Horse, pony, and all the in between! The smooth front variation is now in. This pair here are a size large example test

Horse, pony, and all the in between! The smooth front variation is now in. This pair here are a size large example test

Biglionsden 🦁

41,189 Aufrufe • vor 8 Monaten

Eric Cantona : " If Gaza is over this time the world is inevitably over. The world developed its way of murdering , its greediness grew and its desecration of innocents has extended . If Gaza doesn't make it this time this mean that victory is for the most violent , and the most barbaric "

Eric Cantona : " If Gaza is over this time the world is inevitably over. The world developed its way of murdering , its greediness grew and its desecration of innocents has extended . If Gaza doesn't make it this time this mean that victory is for the most violent , and the most barbaric "

Irlandarra

684,677 Aufrufe • vor 1 Jahr

The definition of insanity is doing the same thing over and over while expecting a different result.

The definition of insanity is doing the same thing over and over while expecting a different result.

Toby Doeden

82,793 Aufrufe • vor 7 Monaten

Watch Spot crouch, jump, climb boxes and leap across gaps, controlled by a neural network trained with reinforcement learning (RL) and multi-expert distillation. Multiple expert policies were trained and distilled together into a single policy that was fine tuned to improve performance over diverse terrains. This work was inspired by ANYmal’s parkour capabilities. The neural network processes depth data from Spot's sensors to construct an understanding of the environment.

Watch Spot crouch, jump, climb boxes and leap across gaps, controlled by a neural network trained with reinforcement learning (RL) and multi-expert distillation. Multiple expert policies were trained and distilled together into a single policy that was fine tuned to improve performance over diverse terrains. This work was inspired by ANYmal’s parkour capabilities. The neural network processes depth data from Spot's sensors to construct an understanding of the environment.

RAI Institute

14,768 Aufrufe • vor 2 Monaten

The start of the Prayer Service at the Sistine Chapel. This is the first time a British monarch and a Pope have prayed together here since 855. Charles is here as King and Head of the Church of England.

The start of the Prayer Service at the Sistine Chapel. This is the first time a British monarch and a Pope have prayed together here since 855. Charles is here as King and Head of the Church of England.

Raymond Arroyo

17,805 Aufrufe • vor 9 Monaten