Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

When doing machine learning and AI research (or writing books), making the code reproducible is usually desirable. Often, that's easier said than done! So, I recorded a video illustrating and dealing with 6 sources of randomness that occur when training deep neural networks and LLMs: 1. Model weight initialization... show more

Sebastian Raschka

460,519 subscribers

81,670 views • 2 years ago •via X (Twitter)

News & Politics Science & Technology Education

Anya Rossi• Live Now

Private livecam show

6 Comments

Sebastian Raschka2 years ago

If you prefer, here is a YouTube version that includes chapter marks:

michael tetelman2 years ago

Maybe a real solution is to consider a probability of a model state (weights) instead of actual numerical values of the state. The optimization process is random by def, e.g. with Langevin dynamics based methods. But when lr decays the final state is the same with very high prob.

Sebastian Raschka2 years ago

I like this idea, and it kind of is what Baysian neural networks do? But for those who prefer certain huge models due to the good predictive performance, e.g. vision and language transformers, this would probably not be feasible, I'd say.

Prechtl Chris2 years ago

Great Video. Nice explanations. Thanks for that!

Sudhansu Bhushan Mishra2 years ago

Thank you for sharing this video! I'm equally excited for reading your new book too.

Sebastian Raschka2 years ago

Fun fact: it's basically based on chapter 10 :)

Related Videos

Foundation models are powerful machine learning algorithms that sit at the core of many generative AI tools today. How are they built and deployed and how are they changing society? Here’s a quick intro for non-experts:

Foundation models are powerful machine learning algorithms that sit at the core of many generative AI tools today. How are they built and deployed and how are they changing society? Here’s a quick intro for non-experts:

Stanford HAI

20,769 views • 3 years ago

OPENAI WHISTLEBLOWER'S MOTHER: SUCHIR WAS A GENIUS Suchir Balaji's mother, Poornima Ramaro: "When he was in high school, for 2 years, every weekend, day and night, he was practicing algorithms. He downloaded data related to machine learning from many different universities. Way before anyone else knew about AI, he was already a champion in machine learning and algorithms. At the age of 11, he started programming. Within 6 months, he used to ask me questions that I could not answer on algorithms. He was very advanced." Source:

OPENAI WHISTLEBLOWER'S MOTHER: SUCHIR WAS A GENIUS Suchir Balaji's mother, Poornima Ramaro: "When he was in high school, for 2 years, every weekend, day and night, he was practicing algorithms. He downloaded data related to machine learning from many different universities. Way before anyone else knew about AI, he was already a champion in machine learning and algorithms. At the age of 11, he started programming. Within 6 months, he used to ask me questions that I could not answer on algorithms. He was very advanced." Source:

Mario Nawfal

588,952 views • 1 year ago

“When AI Discovers the Next Transformer” Robert Lange (Sakana AI) joins Tim Scarfe (Machine Learning Street Talk) to discuss Shinka Evolve, a framework that combines LLMs with evolutionary algorithms to do open-ended program search. Full Video:

“When AI Discovers the Next Transformer” Robert Lange (Sakana AI) joins Tim Scarfe (Machine Learning Street Talk) to discuss Shinka Evolve, a framework that combines LLMs with evolutionary algorithms to do open-ended program search. Full Video:

Sakana AI

124,932 views • 3 months ago

The OFFICIAL 5 minute guide to TrenchBuddy and the $TRENCH ecosystem. How to find and research a token using our world-class onchain detection algorithms and AI deep research agents. / / Enjoy!

The OFFICIAL 5 minute guide to TrenchBuddy and the $TRENCH ecosystem. How to find and research a token using our world-class onchain detection algorithms and AI deep research agents. / / Enjoy!

TrenchBuddy

15,386 views • 1 year ago

Introducing Qwen 3 + Deep Research! We built a deep research agent with Qwen 3 and it performed better than most open source models that we've tried. the code is completely open source and free! built with 's Langgraph Together AI and uses Perplexity Tavily for search here's the code link:

Introducing Qwen 3 + Deep Research! We built a deep research agent with Qwen 3 and it performed better than most open source models that we've tried. the code is completely open source and free! built with 's Langgraph Together AI and uses Perplexity Tavily for search here's the code link:

Soham

34,566 views • 1 year ago

Introducing Adjoint Sampling, a new learning algorithm that trains generative models based on scalar rewards. Based on theoretical foundations developed by FAIR, Adjoint Sampling leads to a highly scalable practical algorithm, and can become the foundation for further research into highly scalable sampling methods. Read our research paper on Adjoint Sampling and download the model, code, and benchmark ➡️

Introducing Adjoint Sampling, a new learning algorithm that trains generative models based on scalar rewards. Based on theoretical foundations developed by FAIR, Adjoint Sampling leads to a highly scalable practical algorithm, and can become the foundation for further research into highly scalable sampling methods. Read our research paper on Adjoint Sampling and download the model, code, and benchmark ➡️

AI at Meta

36,987 views • 1 year ago

Mental map of Markov Chain Monte Carlo (MCMC) algorithms, and analogous machine learning (ML) algorithms [dashed = especially loose analogy]. Grey boxes are basic tools, and each arrow is annotated with the "delta" between algorithms.

Mental map of Markov Chain Monte Carlo (MCMC) algorithms, and analogous machine learning (ML) algorithms [dashed = especially loose analogy]. Grey boxes are basic tools, and each arrow is annotated with the "delta" between algorithms.

Keenan Crane

100,602 views • 1 year ago

New video series: Physics Informed Machine Learning! Physics may be embedded into AI/ML in 5 stages: 1 choose what to Model 2 curate training Data 3 design an Architecture 4 craft a Loss Function, and 5 implement Optimization Algorithm to train the model

New video series: Physics Informed Machine Learning! Physics may be embedded into AI/ML in 5 stages: 1 choose what to Model 2 curate training Data 3 design an Architecture 4 craft a Loss Function, and 5 implement Optimization Algorithm to train the model

Steven Brunton

78,438 views • 2 years ago

Security is a must for the development of AI and machine learning — and vice versa Dawn Song, Founder of Oasis Labs, explains how AI models can benefit security research View the full interview — where Dawn shares her journey into AI, Machine Learning, and web3 — below:

Security is a must for the development of AI and machine learning — and vice versa Dawn Song, Founder of Oasis Labs, explains how AI models can benefit security research View the full interview — where Dawn shares her journey into AI, Machine Learning, and web3 — below:

Oasis Labs

33,121 views • 3 years ago

if you're struggling on where to start learning ML, here’s a playlist of 30 youtube videos to learn machine learning fundamentals from scratch "Machine Learning: Teach by Doing" is a solid choice to learn both theory and code. (1) Introduction to Machine Learning Teach by Doing: (2) What is Machine Learning? History of Machine Learning: (3) Types of ML Models: (4) 6 steps of any ML project: (5) Install Python and VSCode and run your first code: (6) Linear Classifiers Part 1: (7) Linear Classifiers Part 2: (8) Jupyter Notebook, Numpy and Scikit-Learn: (9) Running the Random Linear Classifier Algorithm in Python: (10) The oldest ML model - Perceptron: (11) Coding the Perceptron: (12) Perceptron Convergence Theorem: (13) Magic of features in Machine Learning: (14) One hot encoding: (15) Logistic Regression Part 1: (16) Cross Entropy Loss: (17) How gradient descent works: (18) Logistic Regression from scratch in Python: (19) Introduction to Regularization: (20) Implementing Regularization in Python: (21) Linear Regression Introduction: (22) Ordinary Least Squares step by step implementation: (23) Ridge regression fundamentals and intuition: (24) Regression recap for interviews: (25) Neural network architecture in 30 minutes: (26) Backpropagation intuition: (27) Neural network activation functions: (28) Momentum in gradient descent: (29) Hands on neural network training in Python: (30) Introduction to Convolutional Neural Networks (CNNs):

if you're struggling on where to start learning ML, here’s a playlist of 30 youtube videos to learn machine learning fundamentals from scratch "Machine Learning: Teach by Doing" is a solid choice to learn both theory and code. (1) Introduction to Machine Learning Teach by Doing: (2) What is Machine Learning? History of Machine Learning: (3) Types of ML Models: (4) 6 steps of any ML project: (5) Install Python and VSCode and run your first code: (6) Linear Classifiers Part 1: (7) Linear Classifiers Part 2: (8) Jupyter Notebook, Numpy and Scikit-Learn: (9) Running the Random Linear Classifier Algorithm in Python: (10) The oldest ML model - Perceptron: (11) Coding the Perceptron: (12) Perceptron Convergence Theorem: (13) Magic of features in Machine Learning: (14) One hot encoding: (15) Logistic Regression Part 1: (16) Cross Entropy Loss: (17) How gradient descent works: (18) Logistic Regression from scratch in Python: (19) Introduction to Regularization: (20) Implementing Regularization in Python: (21) Linear Regression Introduction: (22) Ordinary Least Squares step by step implementation: (23) Ridge regression fundamentals and intuition: (24) Regression recap for interviews: (25) Neural network architecture in 30 minutes: (26) Backpropagation intuition: (27) Neural network activation functions: (28) Momentum in gradient descent: (29) Hands on neural network training in Python: (30) Introduction to Convolutional Neural Networks (CNNs):

ℏεsam

108,861 views • 1 year ago

a playlist of 30 youtube videos to learn machine learning fundamentals from scratch if you're struggling on where to start learning ML, this list goes this "Machine Learning: Teach by Doing" is a solid choice to learn both theory and code. (1) Introduction to Machine Learning Teach by Doing: (2) What is Machine Learning? History of Machine Learning: (3) Types of ML Models: (4) 6 steps of any ML project: (5) Install Python and VSCode and run your first code: (6) Linear Classifiers Part 1: (7) Linear Classifiers Part 2: (8) Jupyter Notebook, Numpy and Scikit-Learn: (9) Running the Random Linear Classifier Algorithm in Python: (10) The oldest ML model - Perceptron: (11) Coding the Perceptron: (12) Perceptron Convergence Theorem: (13) Magic of features in Machine Learning: (14) One hot encoding: (15) Logistic Regression Part 1: (16) Cross Entropy Loss: (17) How gradient descent works: (18) Logistic Regression from scratch in Python: (19) Introduction to Regularization: (20) Implementing Regularization in Python: (21) Linear Regression Introduction: (22) Ordinary Least Squares step by step implementation: (23) Ridge regression fundamentals and intuition: (24) Regression recap for interviews: (25) Neural network architecture in 30 minutes: (26) Backpropagation intuition: (27) Neural network activation functions: (28) Momentum in gradient descent: (29) Hands on neural network training in Python: (30) Introduction to Convolutional Neural Networks (CNNs):

a playlist of 30 youtube videos to learn machine learning fundamentals from scratch if you're struggling on where to start learning ML, this list goes this "Machine Learning: Teach by Doing" is a solid choice to learn both theory and code. (1) Introduction to Machine Learning Teach by Doing: (2) What is Machine Learning? History of Machine Learning: (3) Types of ML Models: (4) 6 steps of any ML project: (5) Install Python and VSCode and run your first code: (6) Linear Classifiers Part 1: (7) Linear Classifiers Part 2: (8) Jupyter Notebook, Numpy and Scikit-Learn: (9) Running the Random Linear Classifier Algorithm in Python: (10) The oldest ML model - Perceptron: (11) Coding the Perceptron: (12) Perceptron Convergence Theorem: (13) Magic of features in Machine Learning: (14) One hot encoding: (15) Logistic Regression Part 1: (16) Cross Entropy Loss: (17) How gradient descent works: (18) Logistic Regression from scratch in Python: (19) Introduction to Regularization: (20) Implementing Regularization in Python: (21) Linear Regression Introduction: (22) Ordinary Least Squares step by step implementation: (23) Ridge regression fundamentals and intuition: (24) Regression recap for interviews: (25) Neural network architecture in 30 minutes: (26) Backpropagation intuition: (27) Neural network activation functions: (28) Momentum in gradient descent: (29) Hands on neural network training in Python: (30) Introduction to Convolutional Neural Networks (CNNs):

ℏεsam

117,570 views • 1 year ago

Foundation models, machine learning, and #AI use cases. In this episode of Smart Talks with IBM hosted by Malcolm Gladwell, IBM Research VP of AI Models explores the future of AI and data platforms like #watsonx with Jacob Goldstein:

Foundation models, machine learning, and #AI use cases. In this episode of Smart Talks with IBM hosted by Malcolm Gladwell, IBM Research VP of AI Models explores the future of AI and data platforms like #watsonx with Jacob Goldstein:

IBM

27,078 views • 2 years ago

Can LLMs invent better ways to train LLMs? At Sakana AI, we’re pioneering AI-driven methods to automate AI research and discovery. We’re excited to release DiscoPOP: a new SOTA preference optimization algorithm that was discovered and written by an LLM! Our method leverages LLMs to propose and implement new preference optimization algorithms. We then train models with those algorithms and evaluate their performance, providing feedback to the LLM. By repeating this process for multiple generations in an evolutionary loop, the LLM discovers many highly-performant and novel preference optimization objectives! Paper: GitHub: Model: We proudly collaborated with the University of Oxford (Foerster Lab for AI Research) and Cambridge University (Mihaela van der Schaar) on this groundbreaking project. Looking ahead, we envision a future where AI-driven research reduces the need for extensive human intervention and computational resources. This will accelerate scientific discoveries and innovation, pushing the boundaries of what AI can achieve.

Can LLMs invent better ways to train LLMs? At Sakana AI, we’re pioneering AI-driven methods to automate AI research and discovery. We’re excited to release DiscoPOP: a new SOTA preference optimization algorithm that was discovered and written by an LLM! Our method leverages LLMs to propose and implement new preference optimization algorithms. We then train models with those algorithms and evaluate their performance, providing feedback to the LLM. By repeating this process for multiple generations in an evolutionary loop, the LLM discovers many highly-performant and novel preference optimization objectives! Paper: GitHub: Model: We proudly collaborated with the University of Oxford (Foerster Lab for AI Research) and Cambridge University (Mihaela van der Schaar) on this groundbreaking project. Looking ahead, we envision a future where AI-driven research reduces the need for extensive human intervention and computational resources. This will accelerate scientific discoveries and innovation, pushing the boundaries of what AI can achieve.

Sakana AI

555,782 views • 2 years ago

AI Coding Agent for Hardware Optimized Code Diana AI hardware is still constrained by software. However, with reasoning models like Deepseek R1 or OpenAI o1 and o3, AI could generate hardware-optimized code that rivals—or surpasses—human CUDA code.

AI Coding Agent for Hardware Optimized Code Diana AI hardware is still constrained by software. However, with reasoning models like Deepseek R1 or OpenAI o1 and o3, AI could generate hardware-optimized code that rivals—or surpasses—human CUDA code.

Y Combinator

60,089 views • 1 year ago

you tend to hear this a lot from people outside or new to ML, and I often point to a talk Ilya gave a few years back: 1) think of any decent deep neural net that has enough memory and sequential ops as just a big parallel computer 2) training this neural net is doing search over computer programs that maximize your objective 3)unless you have some large bottleneck (and given you can successfully optimize this system) you’ll find that these parallel computers are highly robust to architectural changes. 4) this is because computers are great at simulating each other. your new architecture can usually be straightforwardly simulated ‘inside’ your old architecture. 5) it’s not that architecture doesn’t matter, but it mostly matters with respect to (1) fundamental bottlenecks in this parallel computer (2) modifications that make models easier to optimize, since this argument only holds if your optimization is good (3) compute efficiency/system efficiency wins that make learning easier or faster. 6) it’s quite possible that new architectures will lead to breakthroughs in machine learning, but we should first start with bottlenecks, not naturalist intuitions about the ‘form’ of AI should take. until you understand this it seems surprising that small models trained longer are better than undertrained big models, that depth and width are surprisingly interchangeable, that talking to a model with an MoE or sparse attention or linear attention is approximately the same iso evals.

you tend to hear this a lot from people outside or new to ML, and I often point to a talk Ilya gave a few years back: 1) think of any decent deep neural net that has enough memory and sequential ops as just a big parallel computer 2) training this neural net is doing search over computer programs that maximize your objective 3)unless you have some large bottleneck (and given you can successfully optimize this system) you’ll find that these parallel computers are highly robust to architectural changes. 4) this is because computers are great at simulating each other. your new architecture can usually be straightforwardly simulated ‘inside’ your old architecture. 5) it’s not that architecture doesn’t matter, but it mostly matters with respect to (1) fundamental bottlenecks in this parallel computer (2) modifications that make models easier to optimize, since this argument only holds if your optimization is good (3) compute efficiency/system efficiency wins that make learning easier or faster. 6) it’s quite possible that new architectures will lead to breakthroughs in machine learning, but we should first start with bottlenecks, not naturalist intuitions about the ‘form’ of AI should take. until you understand this it seems surprising that small models trained longer are better than undertrained big models, that depth and width are surprisingly interchangeable, that talking to a model with an MoE or sparse attention or linear attention is approximately the same iso evals.

will depue

214,639 views • 6 months ago

Watch "Endless experimentation: Building AI models in the wild." This is a 50-minute, MIT lecture about problems when deploying models and LLMs in the real world and how to prepare to solve them. This is a great lecture for those building production machine learning.

Watch "Endless experimentation: Building AI models in the wild." This is a 50-minute, MIT lecture about problems when deploying models and LLMs in the real world and how to prepare to solve them. This is a great lecture for those building production machine learning.

Santiago

117,204 views • 2 years ago

Neural networks and the art of human-like thinking | Full interview with Dan Shipper Dan Shipper 📧, CEO of Every 📧 0:00 Neural networks and human intuition 1:13 The limits of rationalism, from Socrates to neural networks 1:23 Rationalism 2:42 Socrates, the father of Rationalism 5:47 The Age of Enlightenment 7:36 The structure of social sciences 8:51 Defining AI 9:47 The origins of AI 10:39 The General Problem Solver 15:09 Neural networks 18:22 Metaphors for the mind 23:00 Seeing the world like a large language model 30:25 Should we stop looking for general theories? 32:22 Training neural networks 39:32 Will AI steal our humanity? 43:43 AI and rational explanation 47:17 Could LLMs be dangerous? 51:12 Knowledge economies and allocation economies

Neural networks and the art of human-like thinking | Full interview with Dan Shipper Dan Shipper 📧, CEO of Every 📧 0:00 Neural networks and human intuition 1:13 The limits of rationalism, from Socrates to neural networks 1:23 Rationalism 2:42 Socrates, the father of Rationalism 5:47 The Age of Enlightenment 7:36 The structure of social sciences 8:51 Defining AI 9:47 The origins of AI 10:39 The General Problem Solver 15:09 Neural networks 18:22 Metaphors for the mind 23:00 Seeing the world like a large language model 30:25 Should we stop looking for general theories? 32:22 Training neural networks 39:32 Will AI steal our humanity? 43:43 AI and rational explanation 47:17 Could LLMs be dangerous? 51:12 Knowledge economies and allocation economies

Big Think

11,741 views • 7 months ago

Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the exploration of different training algorithms for AI Research Agents such as RL 🛠️ Provides a flexible evaluation framework that can accommodate different artifacts such as models, algorithms, or predictions 🤖 Allows researchers to evaluate any model without the need to develop a custom agentic harness 🎯 Introduces 13 diverse open-ended AI Research tasks for evaluating AI Research Agents on a wide range of domains such as computer vision, natural language processing, reinforcement learning, game theory, and logical reasoning. 📈 Proposes a new evaluation metric for AI Research Agents MLGym makes it easy to: 1) Add new tasks 2) Evaluate new models 3) Integrate new agents Check out a video of the MLGym Agent to see how it performs the full pipeline of idea generation💡, implementation 👩‍💻, experimentation 👩‍🔬, and iteration 🔄 to improve on ML tasks. Huge thanks to the exceptionally talented Deepak Nathani who led this work and to all the other amazing collaborators who made this possible 🙏🫶🚀

Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the exploration of different training algorithms for AI Research Agents such as RL 🛠️ Provides a flexible evaluation framework that can accommodate different artifacts such as models, algorithms, or predictions 🤖 Allows researchers to evaluate any model without the need to develop a custom agentic harness 🎯 Introduces 13 diverse open-ended AI Research tasks for evaluating AI Research Agents on a wide range of domains such as computer vision, natural language processing, reinforcement learning, game theory, and logical reasoning. 📈 Proposes a new evaluation metric for AI Research Agents MLGym makes it easy to: 1) Add new tasks 2) Evaluate new models 3) Integrate new agents Check out a video of the MLGym Agent to see how it performs the full pipeline of idea generation💡, implementation 👩‍💻, experimentation 👩‍🔬, and iteration 🔄 to improve on ML tasks. Huge thanks to the exceptionally talented Deepak Nathani who led this work and to all the other amazing collaborators who made this possible 🙏🫶🚀

Roberta Raileanu

104,964 views • 1 year ago

Stable Virtual Camera: Generative View Synthesis with Diffusion Models Hint: Check the project website. It is awesome! Contributions: 1. A training strategy for jointly modeling large viewpoint changes and temporal smoothness. 2. A two-pass procedural sampling method for smooth video generation along arbitrarily long camera trajectories. 3. A comprehensive benchmark that evaluates NVS methods across different datasets and settings. 4. An open-source release of model weights to support future research.

Stable Virtual Camera: Generative View Synthesis with Diffusion Models Hint: Check the project website. It is awesome! Contributions: 1. A training strategy for jointly modeling large viewpoint changes and temporal smoothness. 2. A two-pass procedural sampling method for smooth video generation along arbitrarily long camera trajectories. 3. A comprehensive benchmark that evaluates NVS methods across different datasets and settings. 4. An open-source release of model weights to support future research.

MrNeRF

38,619 views • 1 year ago

Google is offering a Generative AI Learning Path with 10 courses for FREE! - Intro to Generative AI - Intro to LLMs - Intro to Image Generation - Encoder-Decoder Architecture - Transformer Models and more A Thread 🧵👇

Google is offering a Generative AI Learning Path with 10 courses for FREE! - Intro to Generative AI - Intro to LLMs - Intro to Image Generation - Encoder-Decoder Architecture - Transformer Models and more A Thread 🧵👇

Afiz ⚡️

249,552 views • 3 years ago