正在加载视频...
视频加载失败
When doing machine learning and AI research (or writing books), making the code reproducible is usually desirable. Often, that's easier said than done! So, I recorded a video illustrating and dealing with 6 sources of randomness that occur when training deep neural networks and LLMs: 1. Model weight initialization... show more
6 条评论

If you prefer, here is a YouTube version that includes chapter marks:

Maybe a real solution is to consider a probability of a model state (weights) instead of actual numerical values of the state. The optimization process is random by def, e.g. with Langevin dynamics based methods. But when lr decays the final state is the same with very high prob.

I like this idea, and it kind of is what Baysian neural networks do? But for those who prefer certain huge models due to the good predictive performance, e.g. vision and language transformers, this would probably not be feasible, I'd say.

Great Video. Nice explanations. Thanks for that!

Thank you for sharing this video! I'm equally excited for reading your new book too.

Fun fact: it's basically based on chapter 10 :)
