Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Have you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.

1,768,010 görüntüleme • 2 yıl önce •via X (Twitter)

11 Yorum

Jascha Sohl-Dickstein profil fotoğrafı
Jascha Sohl-Dickstein2 yıl önce

The boundary between trainable and untrainable neural network hyperparameter configurations is *fractal*! And beautiful! Here is a grid search over a different pair of hyperparameters -- this time learning rate and the mean of the parameter initialization distribution.

Jascha Sohl-Dickstein profil fotoğrafı
Jascha Sohl-Dickstein2 yıl önce

There are similarities between the way in which many fractals are generated, and the way in which we train neural networks. Both involve repeatedly applying a function to its own output. In both cases, that function has hyperparameters that control its behavior.

Jascha Sohl-Dickstein profil fotoğrafı
Jascha Sohl-Dickstein2 yıl önce

In both cases the function iteration can produce outputs that either diverge to infinity or remain happily bounded depending on those hyperparameters. Fractals are often defined by the boundary between hyperparameters where function iteration diverges or remains bounded.

Jascha Sohl-Dickstein profil fotoğrafı
Jascha Sohl-Dickstein2 yıl önce

So it shouldn't (post-hoc) be a surprise that hyperparameter landscapes are fractal. This is a general phenomenon: in these panes we see fractal hyperparameter landscapes for every neural network configuration I tried, including deep linear networks.

Jascha Sohl-Dickstein profil fotoğrafı
Jascha Sohl-Dickstein2 yıl önce

The best performing hyperparameters are typically at the edge of stability -- so when you optimize neural network hyperparameters, you are contending with hyperparameter landscapes that look like this.

Jascha Sohl-Dickstein profil fotoğrafı
Jascha Sohl-Dickstein2 yıl önce

Want to learn more? Blog post: 3-page paper:

Jascha Sohl-Dickstein profil fotoğrafı
Jascha Sohl-Dickstein2 yıl önce

I don't have a SoundCloud, but I did join Anthropic last week, and so far it has exceeded my (high) expectations. I would strongly recommend working there (and using Claude). *this project not done at Anthropic -- this was recreational machine learning on my own time.

Hattie Zhou profil fotoğrafı
Hattie Zhou2 yıl önce

So cool Jascha!! I wonder what this would look like for generalization performance on symbolic tasks…e.g. grokking phase diagram ( Would you expect a similar fractal pattern?

Jascha Sohl-Dickstein profil fotoğrafı
Jascha Sohl-Dickstein2 yıl önce

That is a great question!! I hadn't thought of that. My guess would be that we would see a similar fractal structure at the boundaries in the phase diagram you pasted ... but I'm not sure. It would be a fascinating experiment. (probably a lot more expensive than the experiment I ran -- I used a width 16 one hidden layer network, and generating a video took overnight on an A100 -- for grokking experiments though you probably need to train something significantly bigger?)

Daniel Dugas profil fotoğrafı
Daniel Dugas2 yıl önce

I'm amazed! If it's possible to make high res prints of some of these patterns I'd hang one in my house (maybe mixed in with earth satellite imagery for maximum confusion)

Jascha Sohl-Dickstein profil fotoğrafı
Jascha Sohl-Dickstein2 yıl önce

Go for it! All the raw images are here:

Benzer Videolar