正在加载视频...

视频加载失败

When doing machine learning and AI research (or writing books), making the code reproducible is usually desirable. Often, that's easier said than done! So, I recorded a video illustrating and dealing with 6 sources of randomness that occur when training deep neural networks and LLMs: 1. Model weight initialization...

81,670 次观看 • 2 年前 •via X (Twitter)

6 条评论

Sebastian Raschka 的头像
Sebastian Raschka2 年前

If you prefer, here is a YouTube version that includes chapter marks:

michael tetelman 的头像
michael tetelman2 年前

Maybe a real solution is to consider a probability of a model state (weights) instead of actual numerical values of the state. The optimization process is random by def, e.g. with Langevin dynamics based methods. But when lr decays the final state is the same with very high prob.

Sebastian Raschka 的头像
Sebastian Raschka2 年前

I like this idea, and it kind of is what Baysian neural networks do? But for those who prefer certain huge models due to the good predictive performance, e.g. vision and language transformers, this would probably not be feasible, I'd say.

Prechtl Chris 的头像
Prechtl Chris2 年前

Great Video. Nice explanations. Thanks for that!

Sudhansu Bhushan Mishra 的头像
Sudhansu Bhushan Mishra2 年前

Thank you for sharing this video! I'm equally excited for reading your new book too.

Sebastian Raschka 的头像
Sebastian Raschka2 年前

Fun fact: it's basically based on chapter 10 :)

相关视频

if you're struggling on where to start learning ML, here’s a playlist of 30 youtube videos to learn machine learning fundamentals from scratch "Machine Learning: Teach by Doing" is a solid choice to learn both theory and code. (1) Introduction to Machine Learning Teach by Doing: (2) What is Machine Learning? History of Machine Learning: (3) Types of ML Models: (4) 6 steps of any ML project: (5) Install Python and VSCode and run your first code: (6) Linear Classifiers Part 1: (7) Linear Classifiers Part 2: (8) Jupyter Notebook, Numpy and Scikit-Learn: (9) Running the Random Linear Classifier Algorithm in Python: (10) The oldest ML model - Perceptron: (11) Coding the Perceptron: (12) Perceptron Convergence Theorem: (13) Magic of features in Machine Learning: (14) One hot encoding: (15) Logistic Regression Part 1: (16) Cross Entropy Loss: (17) How gradient descent works: (18) Logistic Regression from scratch in Python: (19) Introduction to Regularization: (20) Implementing Regularization in Python: (21) Linear Regression Introduction: (22) Ordinary Least Squares step by step implementation: (23) Ridge regression fundamentals and intuition: (24) Regression recap for interviews: (25) Neural network architecture in 30 minutes: (26) Backpropagation intuition: (27) Neural network activation functions: (28) Momentum in gradient descent: (29) Hands on neural network training in Python: (30) Introduction to Convolutional Neural Networks (CNNs):

ℏεsam

108,861 次观看 • 1 年前

a playlist of 30 youtube videos to learn machine learning fundamentals from scratch if you're struggling on where to start learning ML, this list goes this "Machine Learning: Teach by Doing" is a solid choice to learn both theory and code. (1) Introduction to Machine Learning Teach by Doing: (2) What is Machine Learning? History of Machine Learning: (3) Types of ML Models: (4) 6 steps of any ML project: (5) Install Python and VSCode and run your first code: (6) Linear Classifiers Part 1: (7) Linear Classifiers Part 2: (8) Jupyter Notebook, Numpy and Scikit-Learn: (9) Running the Random Linear Classifier Algorithm in Python: (10) The oldest ML model - Perceptron: (11) Coding the Perceptron: (12) Perceptron Convergence Theorem: (13) Magic of features in Machine Learning: (14) One hot encoding: (15) Logistic Regression Part 1: (16) Cross Entropy Loss: (17) How gradient descent works: (18) Logistic Regression from scratch in Python: (19) Introduction to Regularization: (20) Implementing Regularization in Python: (21) Linear Regression Introduction: (22) Ordinary Least Squares step by step implementation: (23) Ridge regression fundamentals and intuition: (24) Regression recap for interviews: (25) Neural network architecture in 30 minutes: (26) Backpropagation intuition: (27) Neural network activation functions: (28) Momentum in gradient descent: (29) Hands on neural network training in Python: (30) Introduction to Convolutional Neural Networks (CNNs):

ℏεsam

117,570 次观看 • 1 年前

you tend to hear this a lot from people outside or new to ML, and I often point to a talk Ilya gave a few years back: 1) think of any decent deep neural net that has enough memory and sequential ops as just a big parallel computer 2) training this neural net is doing search over computer programs that maximize your objective 3)unless you have some large bottleneck (and given you can successfully optimize this system) you’ll find that these parallel computers are highly robust to architectural changes. 4) this is because computers are great at simulating each other. your new architecture can usually be straightforwardly simulated ‘inside’ your old architecture. 5) it’s not that architecture doesn’t matter, but it mostly matters with respect to (1) fundamental bottlenecks in this parallel computer (2) modifications that make models easier to optimize, since this argument only holds if your optimization is good (3) compute efficiency/system efficiency wins that make learning easier or faster. 6) it’s quite possible that new architectures will lead to breakthroughs in machine learning, but we should first start with bottlenecks, not naturalist intuitions about the ‘form’ of AI should take. until you understand this it seems surprising that small models trained longer are better than undertrained big models, that depth and width are surprisingly interchangeable, that talking to a model with an MoE or sparse attention or linear attention is approximately the same iso evals.

will depue

214,639 次观看 • 6 个月前