正在加载视频...

视频加载失败

The boundary between trainable and untrainable neural network hyperparameter configurations is *fractal*! And beautiful! Here is a grid search over a different pair of hyperparameters -- this time learning rate and the mean of the parameter initialization distribution.

250,458 次观看 • 2 年前 •via X (Twitter)

10 条评论

Jascha Sohl-Dickstein 的头像
Jascha Sohl-Dickstein2 年前

Have you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.

Jascha Sohl-Dickstein 的头像
Jascha Sohl-Dickstein2 年前

There are similarities between the way in which many fractals are generated, and the way in which we train neural networks. Both involve repeatedly applying a function to its own output. In both cases, that function has hyperparameters that control its behavior.

Jascha Sohl-Dickstein 的头像
Jascha Sohl-Dickstein2 年前

In both cases the function iteration can produce outputs that either diverge to infinity or remain happily bounded depending on those hyperparameters. Fractals are often defined by the boundary between hyperparameters where function iteration diverges or remains bounded.

Jascha Sohl-Dickstein 的头像
Jascha Sohl-Dickstein2 年前

So it shouldn't (post-hoc) be a surprise that hyperparameter landscapes are fractal. This is a general phenomenon: in these panes we see fractal hyperparameter landscapes for every neural network configuration I tried, including deep linear networks.

Jascha Sohl-Dickstein 的头像
Jascha Sohl-Dickstein2 年前

The best performing hyperparameters are typically at the edge of stability -- so when you optimize neural network hyperparameters, you are contending with hyperparameter landscapes that look like this.

Jascha Sohl-Dickstein 的头像
Jascha Sohl-Dickstein2 年前

Want to learn more? Blog post: 3-page paper:

Jascha Sohl-Dickstein 的头像
Jascha Sohl-Dickstein2 年前

I don't have a SoundCloud, but I did join Anthropic last week, and so far it has exceeded my (high) expectations. I would strongly recommend working there (and using Claude). *this project not done at Anthropic -- this was recreational machine learning on my own time.

Kosta Derpanis 的头像
Kosta Derpanis2 年前

Just in time to make the cut for my lecture today. At 45 sec mark. Thanks for sharing!

Mihoda 的头像
Mihoda2 年前

I'm not sure what I'm looking at, but my guess at interpretation would be instability.

Kenneth Shinozuka 的头像
Kenneth Shinozuka2 年前

beautiful result

相关视频