Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Andrej Karpathy summarizes how neural networks function, including the process of using a loss function in training and applying backpropagation for optimizing the network parameters through gradient descent.

tetsuo

225,858 subscribers

141,964 Aufrufe • vor 9 Monaten •via X (Twitter)

Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Trained a single layer neural network in C to predict AND GATE from scratch using gradient descent and Batch training of dataset in almost 300K epochs

Trained a single layer neural network in C to predict AND GATE from scratch using gradient descent and Batch training of dataset in almost 300K epochs

AADITYANSHA

80,920 Aufrufe • vor 3 Monaten

This video, created by my dear coauthor Mahdi E Kahou for our teaching and papers, shows how overparameterized neural networks produce smooth function approximations even in the context of the Runge phenomenon. Some background. Imagine you want to approximate the Runge function using polynomial interpolation at equally spaced points. It is well known that, despite targeting an infinitely differentiable function, such a polynomial approximation produces oscillatory behavior that worsens with the degree of the polynomial. In other words, higher-degree polynomial approximations might not improve accuracy. Instead, approximate the Runge function with a neural network (here, two layers are just to make the example concrete; nothing fundamental depends on it). As you increase the number of parameters well above the 11 training points (in our example, a two-layer neural network with 128 nodes each), you nicely converge to the target, without wild oscillations. Yes, this has much to do with double descent and benign overparameterization, but the main punchline of this post is that neural networks are really very different types of animals than polynomial approximations. And yes, Chebyshev nodes and splines exist, and in this case, they will prevent the oscillations. But that's not the point. Chebyshev nodes and splines still confront Faber’s theorem, which states that for any system of polynomial interpolation nodes, there exists a continuous function whose sequence of interpolating polynomials diverges as the number of nodes grows to infinity. Faber’s theorem does not apply to neural networks because they are not polynomials. The notebook, if you want to check the details, is here: Stay tuned for more on this 👀

This video, created by my dear coauthor Mahdi E Kahou for our teaching and papers, shows how overparameterized neural networks produce smooth function approximations even in the context of the Runge phenomenon. Some background. Imagine you want to approximate the Runge function using polynomial interpolation at equally spaced points. It is well known that, despite targeting an infinitely differentiable function, such a polynomial approximation produces oscillatory behavior that worsens with the degree of the polynomial. In other words, higher-degree polynomial approximations might not improve accuracy. Instead, approximate the Runge function with a neural network (here, two layers are just to make the example concrete; nothing fundamental depends on it). As you increase the number of parameters well above the 11 training points (in our example, a two-layer neural network with 128 nodes each), you nicely converge to the target, without wild oscillations. Yes, this has much to do with double descent and benign overparameterization, but the main punchline of this post is that neural networks are really very different types of animals than polynomial approximations. And yes, Chebyshev nodes and splines exist, and in this case, they will prevent the oscillations. But that's not the point. Chebyshev nodes and splines still confront Faber’s theorem, which states that for any system of polynomial interpolation nodes, there exists a continuous function whose sequence of interpolating polynomials diverges as the number of nodes grows to infinity. Faber’s theorem does not apply to neural networks because they are not polynomials. The notebook, if you want to check the details, is here: Stay tuned for more on this 👀

Jesús Fernández-Villaverde

46,908 Aufrufe • vor 2 Monaten

Individual Neuron: Neural Network A neuron in a neural network performs a weighted sum of inputs, adds a bias, and applies an activation function like sigmoid, ReLU, or tanh, introducing non-linearity. This output helps the neuron learn and represent patterns in the data.

Individual Neuron: Neural Network A neuron in a neural network performs a weighted sum of inputs, adds a bias, and applies an activation function like sigmoid, ReLU, or tanh, introducing non-linearity. This output helps the neuron learn and represent patterns in the data.

ₕₐₘₚₜₒₙ — e/acc

47,467 Aufrufe • vor 2 Jahren

wrote a piece on how we do tiling and gradient descent on TinyTPU included some animations of the weight and bias gradient descent

wrote a piece on how we do tiling and gradient descent on TinyTPU included some animations of the weight and bias gradient descent

Xander Chin

33,044 Aufrufe • vor 10 Monaten

Ever wondered what neural networks are and how they work? Systems like ChatGPT use neural networks to work as well as they do. Neural networks are composed of "layers" of neurons, layers with different functions; connections between layers called "weights"; and mathematical functions called "activation functions". If you’re interested in learning about these systems, check the comments. Ultimately, the neural network structure of the model serves to visually demonstrate that it is, in fact, a complex mathematical equation. When companies release the model's weights, they are releasing a key component needed to run the model's complete equation. Without the weights, the equation is incomplete. For the math-minded: the weights of a model are the learned numbers (they are variables during training) that are then used as constants in the mathematical functions that make up the model. Neural networks are ultimately just one big, hyper-complex mathematical function, and when a model is trained, it learns the constants associated with the high-dimensional input.

Ever wondered what neural networks are and how they work? Systems like ChatGPT use neural networks to work as well as they do. Neural networks are composed of "layers" of neurons, layers with different functions; connections between layers called "weights"; and mathematical functions called "activation functions". If you’re interested in learning about these systems, check the comments. Ultimately, the neural network structure of the model serves to visually demonstrate that it is, in fact, a complex mathematical equation. When companies release the model's weights, they are releasing a key component needed to run the model's complete equation. Without the weights, the equation is incomplete. For the math-minded: the weights of a model are the learned numbers (they are variables during training) that are then used as constants in the mathematical functions that make up the model. Neural networks are ultimately just one big, hyper-complex mathematical function, and when a model is trained, it learns the constants associated with the high-dimensional input.

Harper Carroll

29,648 Aufrufe • vor 8 Monaten

I've added a function to propagate a gradient across the mesh (it follows the curvature of the hair), and then copy that gradient to the Windows clipboard as a transparent picture, for use as a selection mask in the image editor

I've added a function to propagate a gradient across the mesh (it follows the curvature of the hair), and then copy that gradient to the Windows clipboard as a transparent picture, for use as a selection mask in the image editor

Haï~

13,261 Aufrufe • vor 1 Jahr

if you're struggling on where to start learning ML, here’s a playlist of 30 youtube videos to learn machine learning fundamentals from scratch "Machine Learning: Teach by Doing" is a solid choice to learn both theory and code. (1) Introduction to Machine Learning Teach by Doing: (2) What is Machine Learning? History of Machine Learning: (3) Types of ML Models: (4) 6 steps of any ML project: (5) Install Python and VSCode and run your first code: (6) Linear Classifiers Part 1: (7) Linear Classifiers Part 2: (8) Jupyter Notebook, Numpy and Scikit-Learn: (9) Running the Random Linear Classifier Algorithm in Python: (10) The oldest ML model - Perceptron: (11) Coding the Perceptron: (12) Perceptron Convergence Theorem: (13) Magic of features in Machine Learning: (14) One hot encoding: (15) Logistic Regression Part 1: (16) Cross Entropy Loss: (17) How gradient descent works: (18) Logistic Regression from scratch in Python: (19) Introduction to Regularization: (20) Implementing Regularization in Python: (21) Linear Regression Introduction: (22) Ordinary Least Squares step by step implementation: (23) Ridge regression fundamentals and intuition: (24) Regression recap for interviews: (25) Neural network architecture in 30 minutes: (26) Backpropagation intuition: (27) Neural network activation functions: (28) Momentum in gradient descent: (29) Hands on neural network training in Python: (30) Introduction to Convolutional Neural Networks (CNNs):

if you're struggling on where to start learning ML, here’s a playlist of 30 youtube videos to learn machine learning fundamentals from scratch "Machine Learning: Teach by Doing" is a solid choice to learn both theory and code. (1) Introduction to Machine Learning Teach by Doing: (2) What is Machine Learning? History of Machine Learning: (3) Types of ML Models: (4) 6 steps of any ML project: (5) Install Python and VSCode and run your first code: (6) Linear Classifiers Part 1: (7) Linear Classifiers Part 2: (8) Jupyter Notebook, Numpy and Scikit-Learn: (9) Running the Random Linear Classifier Algorithm in Python: (10) The oldest ML model - Perceptron: (11) Coding the Perceptron: (12) Perceptron Convergence Theorem: (13) Magic of features in Machine Learning: (14) One hot encoding: (15) Logistic Regression Part 1: (16) Cross Entropy Loss: (17) How gradient descent works: (18) Logistic Regression from scratch in Python: (19) Introduction to Regularization: (20) Implementing Regularization in Python: (21) Linear Regression Introduction: (22) Ordinary Least Squares step by step implementation: (23) Ridge regression fundamentals and intuition: (24) Regression recap for interviews: (25) Neural network architecture in 30 minutes: (26) Backpropagation intuition: (27) Neural network activation functions: (28) Momentum in gradient descent: (29) Hands on neural network training in Python: (30) Introduction to Convolutional Neural Networks (CNNs):

ℏεsam

108,861 Aufrufe • vor 1 Jahr

a playlist of 30 youtube videos to learn machine learning fundamentals from scratch if you're struggling on where to start learning ML, this list goes this "Machine Learning: Teach by Doing" is a solid choice to learn both theory and code. (1) Introduction to Machine Learning Teach by Doing: (2) What is Machine Learning? History of Machine Learning: (3) Types of ML Models: (4) 6 steps of any ML project: (5) Install Python and VSCode and run your first code: (6) Linear Classifiers Part 1: (7) Linear Classifiers Part 2: (8) Jupyter Notebook, Numpy and Scikit-Learn: (9) Running the Random Linear Classifier Algorithm in Python: (10) The oldest ML model - Perceptron: (11) Coding the Perceptron: (12) Perceptron Convergence Theorem: (13) Magic of features in Machine Learning: (14) One hot encoding: (15) Logistic Regression Part 1: (16) Cross Entropy Loss: (17) How gradient descent works: (18) Logistic Regression from scratch in Python: (19) Introduction to Regularization: (20) Implementing Regularization in Python: (21) Linear Regression Introduction: (22) Ordinary Least Squares step by step implementation: (23) Ridge regression fundamentals and intuition: (24) Regression recap for interviews: (25) Neural network architecture in 30 minutes: (26) Backpropagation intuition: (27) Neural network activation functions: (28) Momentum in gradient descent: (29) Hands on neural network training in Python: (30) Introduction to Convolutional Neural Networks (CNNs):

a playlist of 30 youtube videos to learn machine learning fundamentals from scratch if you're struggling on where to start learning ML, this list goes this "Machine Learning: Teach by Doing" is a solid choice to learn both theory and code. (1) Introduction to Machine Learning Teach by Doing: (2) What is Machine Learning? History of Machine Learning: (3) Types of ML Models: (4) 6 steps of any ML project: (5) Install Python and VSCode and run your first code: (6) Linear Classifiers Part 1: (7) Linear Classifiers Part 2: (8) Jupyter Notebook, Numpy and Scikit-Learn: (9) Running the Random Linear Classifier Algorithm in Python: (10) The oldest ML model - Perceptron: (11) Coding the Perceptron: (12) Perceptron Convergence Theorem: (13) Magic of features in Machine Learning: (14) One hot encoding: (15) Logistic Regression Part 1: (16) Cross Entropy Loss: (17) How gradient descent works: (18) Logistic Regression from scratch in Python: (19) Introduction to Regularization: (20) Implementing Regularization in Python: (21) Linear Regression Introduction: (22) Ordinary Least Squares step by step implementation: (23) Ridge regression fundamentals and intuition: (24) Regression recap for interviews: (25) Neural network architecture in 30 minutes: (26) Backpropagation intuition: (27) Neural network activation functions: (28) Momentum in gradient descent: (29) Hands on neural network training in Python: (30) Introduction to Convolutional Neural Networks (CNNs):

ℏεsam

117,570 Aufrufe • vor 1 Jahr

we're turning University of Waterloo into a supercomputer! arceus is a cross-device distributed compute network for training large models, using model/tensor/pipeline parallelism. you can train anything, from deep neural networks to language models on the network. oss & deployment soon!

we're turning University of Waterloo into a supercomputer! arceus is a cross-device distributed compute network for training large models, using model/tensor/pipeline parallelism. you can train anything, from deep neural networks to language models on the network. oss & deployment soon!

rajan agarwal

35,720 Aufrufe • vor 1 Jahr

Andrej Karpathy Anthropic Head of Technical Staff: "Multi-task learning has a team problem nobody talks about." Karpathy's team runs 100 subtasks inside one neural network In this 15-minute talk, Karpathy breaks down the full multi-task stack: architecture tradeoffs + loss balancing + data engines + team workflow. Worth more than any $500 ML bootcamp. Watch it & then read the full article below

Andrej Karpathy Anthropic Head of Technical Staff: "Multi-task learning has a team problem nobody talks about." Karpathy's team runs 100 subtasks inside one neural network In this 15-minute talk, Karpathy breaks down the full multi-task stack: architecture tradeoffs + loss balancing + data engines + team workflow. Worth more than any $500 ML bootcamp. Watch it & then read the full article below

Morty

85,849 Aufrufe • vor 17 Tagen

JANE STREET PAYS $650,000 A YEAR FOR QUANTS WHO TRADE USING NEURAL NETWORKS. STANFORD JUST DROPPED THE COMPLETE NEURAL NETWORK FUNDAMENTALS LECTURE FOR FREE. BOOKMARK & WATCH THIS TODAY, THEN READ HOW QUANTS APPLY THIS IN THE ARTICLE BELOW BEFORE SOMEONE TAKES IT DOWN.

JANE STREET PAYS $650,000 A YEAR FOR QUANTS WHO TRADE USING NEURAL NETWORKS. STANFORD JUST DROPPED THE COMPLETE NEURAL NETWORK FUNDAMENTALS LECTURE FOR FREE. BOOKMARK & WATCH THIS TODAY, THEN READ HOW QUANTS APPLY THIS IN THE ARTICLE BELOW BEFORE SOMEONE TAKES IT DOWN.

Roan

126,223 Aufrufe • vor 2 Monaten

Does anyone know what the function of the black and hot liquid is in this process?

Does anyone know what the function of the black and hot liquid is in this process?

Amazing Video

3,195,866 Aufrufe • vor 9 Monaten

A 2-layer neural network goes from total chaos to perfectly separating left vs right classes in real time. Watch the decision boundary form live as gradient descent works its magic! Pure maths beauty in motion..

A 2-layer neural network goes from total chaos to perfectly separating left vs right classes in real time. Watch the decision boundary form live as gradient descent works its magic! Pure maths beauty in motion..

Mathematica

145,059 Aufrufe • vor 4 Monaten

Lecture 1 on Physics-Informed Neural Networks: A Mini-Series Physics-Informed Neural Networks (PINNs) are neural networks trained to satisfy a differential equation by building the PDE residual directly into the loss. They emerged from a very practical problem...classical PDE pipelines can be brilliant, but they often demand heavy discretization work (meshes, stencils, stability tuning), and the method you build is usually tied to one geometry and one solver setup. A PINN flips the workflow by representing the solution itself as a smooth function uᵩ(x,t) and enforcing the physics everywhere you choose to sample the domain. People often meet PINNs in the least helpful way...via a flashy solution plot, and almost no explanation of what was enforced to get it. In this series we keep the enforcement visible. We pick a differential equation, represent the unknown solution as a flexible function, measure how well that function satisfies the equation across the domain, and train it to reduce that mismatch everywhere we sample. A normal neural net learns from labels...you give it inputs and target outputs. A PINN learns from a differential equation...you give it inputs (x,t) and it gets punished whenever its output fails the PDE. By punish we mean that the loss increases when the mismatch is large we reward it if the loss decreases as the mismatch gets smaller. The network isn’t replacing physics, it’s becoming a flexible function that is forced to satisfy the same calculus you’d impose on any candidate solution. The math breakdown: We start with a PDE we want to solve on a domain Ω. Write it as uₜ(x,t) + N(u(x,t), uₓ(x,t), uₓₓ(x,t), …) = 0 for (x,t) in Ω A PINN replaces the unknown function u with a neural network output uᵩ(x,t) Now define the physics residual by plugging uᵩ into the PDE rᵩ(x,t) = ∂uᵩ/∂t + N(uᵩ, ∂uᵩ/∂x, ∂²uᵩ/∂x², …) If uᵩ were an exact solution, we would have rᵩ(x,t) = 0 everywhere. We may also have data points (xᵢ,tᵢ,uᵢ) from measurements or a known initial condition. The training objective is just a weighted sum of squared errors L(ᵩ) = L_data(ᵩ) + λ L_phys(ᵩ) + L_bc/ic(ᵩ) with L_data(ᵩ) = meanᵢ |uᵩ(xᵢ,tᵢ) − uᵢ|² L_phys(ᵩ) = meanⱼ |rᵩ(xⱼ,tⱼ)|² where (xⱼ,tⱼ) are the collocation points in Ω L_bc/ic(ᵩ) = penalties enforcing boundary conditions and initial conditions The key technical step is that the derivatives inside rᵩ are computed by automatic differentiation ∂uᵩ/∂t, ∂uᵩ/∂x, ∂²uᵩ/∂x², … So we can differentiate the total loss L(ᵩ) with respect to ᵩ and train with gradient descent. This is the whole idea behind PINNs. Learn a function, but make the PDE part of the loss, so the network is trained to be a solution, not just a curve-fitter. In the render, the main 3D surface is the network’s current guess uᵩ(x,t), drawn as a living sheet over the (x,t) plane. Hovering above is the neural scaffold...a visible graph of feature nodes and connections. The bright tension threads are the physics residual rᵩ(x,t): each thread tethers a collocation bead on the sheet up to the scaffold, and it thickens and brightens exactly where |rᵩ| is large (color encodes the sign). As training runs, those threads go slack across the domain not because we hid the error, but because the network has actually been pushed toward rᵩ(x,t) ≈ 0. #PINNs #PhysicsInformedNeuralNetworks #ScientificMachineLearning #PDE #DifferentialEquations #Optimization #MachineLearning #AppliedMath #ComputationalPhysics

Lecture 1 on Physics-Informed Neural Networks: A Mini-Series Physics-Informed Neural Networks (PINNs) are neural networks trained to satisfy a differential equation by building the PDE residual directly into the loss. They emerged from a very practical problem...classical PDE pipelines can be brilliant, but they often demand heavy discretization work (meshes, stencils, stability tuning), and the method you build is usually tied to one geometry and one solver setup. A PINN flips the workflow by representing the solution itself as a smooth function uᵩ(x,t) and enforcing the physics everywhere you choose to sample the domain. People often meet PINNs in the least helpful way...via a flashy solution plot, and almost no explanation of what was enforced to get it. In this series we keep the enforcement visible. We pick a differential equation, represent the unknown solution as a flexible function, measure how well that function satisfies the equation across the domain, and train it to reduce that mismatch everywhere we sample. A normal neural net learns from labels...you give it inputs and target outputs. A PINN learns from a differential equation...you give it inputs (x,t) and it gets punished whenever its output fails the PDE. By punish we mean that the loss increases when the mismatch is large we reward it if the loss decreases as the mismatch gets smaller. The network isn’t replacing physics, it’s becoming a flexible function that is forced to satisfy the same calculus you’d impose on any candidate solution. The math breakdown: We start with a PDE we want to solve on a domain Ω. Write it as uₜ(x,t) + N(u(x,t), uₓ(x,t), uₓₓ(x,t), …) = 0 for (x,t) in Ω A PINN replaces the unknown function u with a neural network output uᵩ(x,t) Now define the physics residual by plugging uᵩ into the PDE rᵩ(x,t) = ∂uᵩ/∂t + N(uᵩ, ∂uᵩ/∂x, ∂²uᵩ/∂x², …) If uᵩ were an exact solution, we would have rᵩ(x,t) = 0 everywhere. We may also have data points (xᵢ,tᵢ,uᵢ) from measurements or a known initial condition. The training objective is just a weighted sum of squared errors L(ᵩ) = L_data(ᵩ) + λ L_phys(ᵩ) + L_bc/ic(ᵩ) with L_data(ᵩ) = meanᵢ |uᵩ(xᵢ,tᵢ) − uᵢ|² L_phys(ᵩ) = meanⱼ |rᵩ(xⱼ,tⱼ)|² where (xⱼ,tⱼ) are the collocation points in Ω L_bc/ic(ᵩ) = penalties enforcing boundary conditions and initial conditions The key technical step is that the derivatives inside rᵩ are computed by automatic differentiation ∂uᵩ/∂t, ∂uᵩ/∂x, ∂²uᵩ/∂x², … So we can differentiate the total loss L(ᵩ) with respect to ᵩ and train with gradient descent. This is the whole idea behind PINNs. Learn a function, but make the PDE part of the loss, so the network is trained to be a solution, not just a curve-fitter. In the render, the main 3D surface is the network’s current guess uᵩ(x,t), drawn as a living sheet over the (x,t) plane. Hovering above is the neural scaffold...a visible graph of feature nodes and connections. The bright tension threads are the physics residual rᵩ(x,t): each thread tethers a collocation bead on the sheet up to the scaffold, and it thickens and brightens exactly where |rᵩ| is large (color encodes the sign). As training runs, those threads go slack across the domain not because we hid the error, but because the network has actually been pushed toward rᵩ(x,t) ≈ 0. #PINNs #PhysicsInformedNeuralNetworks #ScientificMachineLearning #PDE #DifferentialEquations #Optimization #MachineLearning #AppliedMath #ComputationalPhysics

Mathelirium

47,308 Aufrufe • vor 6 Monaten

What if Your Neural Network Was Forced to Obey Physics? Physics-Informed Neural Networks (PINNs) are neural networks trained to satisfy a differential equation by building the PDE residual directly into the loss. They emerged from a very practical problem...classical PDE pipelines can be brilliant, but they often demand heavy discretization work (meshes, stencils, stability tuning), and the method you build is usually tied to one geometry and one solver setup. A PINN flips the workflow by representing the solution itself as a smooth function uᵩ(x,t) and enforcing the physics everywhere you choose to sample the domain. People often meet PINNs in the least helpful way...via a flashy solution plot, and almost no explanation of what was enforced to get it. In this series we keep the enforcement visible. We pick a differential equation, represent the unknown solution as a flexible function, measure how well that function satisfies the equation across the domain, and train it to reduce that mismatch everywhere we sample. A normal neural net learns from labels...you give it inputs and target outputs. A PINN learns from a differential equation...you give it inputs (x,t) and it gets punished whenever its output fails the PDE. By punish we mean that the loss increases when the mismatch is large we reward it if the loss decreases as the mismatch gets smaller. The network isn’t replacing physics, it’s becoming a flexible function that is forced to satisfy the same calculus you’d impose on any candidate solution. The math breakdown: We start with a PDE we want to solve on a domain Ω. Write it as uₜ(x,t) + N(u(x,t), uₓ(x,t), uₓₓ(x,t), …) = 0 for (x,t) in Ω A PINN replaces the unknown function u with a neural network output uᵩ(x,t) Now define the physics residual by plugging uᵩ into the PDE rᵩ(x,t) = ∂uᵩ/∂t + N(uᵩ, ∂uᵩ/∂x, ∂²uᵩ/∂x², …) If uᵩ were an exact solution, we would have rᵩ(x,t) = 0 everywhere. We may also have data points (xᵢ,tᵢ,uᵢ) from measurements or a known initial condition. The training objective is just a weighted sum of squared errors L(ᵩ) = L_data(ᵩ) + λ L_phys(ᵩ) + L_bc/ic(ᵩ) with L_data(ᵩ) = meanᵢ |uᵩ(xᵢ,tᵢ) − uᵢ|² L_phys(ᵩ) = meanⱼ |rᵩ(xⱼ,tⱼ)|² where (xⱼ,tⱼ) are the collocation points in Ω L_bc/ic(ᵩ) = penalties enforcing boundary conditions and initial conditions The key technical step is that the derivatives inside rᵩ are computed by automatic differentiation ∂uᵩ/∂t, ∂uᵩ/∂x, ∂²uᵩ/∂x², … So we can differentiate the total loss L(ᵩ) with respect to ᵩ and train with gradient descent. This is the whole idea behind PINNs. Learn a function, but make the PDE part of the loss, so the network is trained to be a solution, not just a curve-fitter. In the render, the main 3D surface is the network’s current guess uᵩ(x,t), drawn as a living sheet over the (x,t) plane. Hovering above is the neural scaffold...a visible graph of feature nodes and connections. The bright tension threads are the physics residual rᵩ(x,t): each thread tethers a collocation bead on the sheet up to the scaffold, and it thickens and brightens exactly where |rᵩ| is large (color encodes the sign). As training runs, those threads go slack across the domain not because we hid the error, but because the network has actually been pushed toward rᵩ(x,t) ≈ 0. #PINNs #PhysicsInformedNeuralNetworks #ScientificMachineLearning #PDE #DifferentialEquations #Optimization #MachineLearning #AppliedMath #ComputationalPhysics

What if Your Neural Network Was Forced to Obey Physics? Physics-Informed Neural Networks (PINNs) are neural networks trained to satisfy a differential equation by building the PDE residual directly into the loss. They emerged from a very practical problem...classical PDE pipelines can be brilliant, but they often demand heavy discretization work (meshes, stencils, stability tuning), and the method you build is usually tied to one geometry and one solver setup. A PINN flips the workflow by representing the solution itself as a smooth function uᵩ(x,t) and enforcing the physics everywhere you choose to sample the domain. People often meet PINNs in the least helpful way...via a flashy solution plot, and almost no explanation of what was enforced to get it. In this series we keep the enforcement visible. We pick a differential equation, represent the unknown solution as a flexible function, measure how well that function satisfies the equation across the domain, and train it to reduce that mismatch everywhere we sample. A normal neural net learns from labels...you give it inputs and target outputs. A PINN learns from a differential equation...you give it inputs (x,t) and it gets punished whenever its output fails the PDE. By punish we mean that the loss increases when the mismatch is large we reward it if the loss decreases as the mismatch gets smaller. The network isn’t replacing physics, it’s becoming a flexible function that is forced to satisfy the same calculus you’d impose on any candidate solution. The math breakdown: We start with a PDE we want to solve on a domain Ω. Write it as uₜ(x,t) + N(u(x,t), uₓ(x,t), uₓₓ(x,t), …) = 0 for (x,t) in Ω A PINN replaces the unknown function u with a neural network output uᵩ(x,t) Now define the physics residual by plugging uᵩ into the PDE rᵩ(x,t) = ∂uᵩ/∂t + N(uᵩ, ∂uᵩ/∂x, ∂²uᵩ/∂x², …) If uᵩ were an exact solution, we would have rᵩ(x,t) = 0 everywhere. We may also have data points (xᵢ,tᵢ,uᵢ) from measurements or a known initial condition. The training objective is just a weighted sum of squared errors L(ᵩ) = L_data(ᵩ) + λ L_phys(ᵩ) + L_bc/ic(ᵩ) with L_data(ᵩ) = meanᵢ |uᵩ(xᵢ,tᵢ) − uᵢ|² L_phys(ᵩ) = meanⱼ |rᵩ(xⱼ,tⱼ)|² where (xⱼ,tⱼ) are the collocation points in Ω L_bc/ic(ᵩ) = penalties enforcing boundary conditions and initial conditions The key technical step is that the derivatives inside rᵩ are computed by automatic differentiation ∂uᵩ/∂t, ∂uᵩ/∂x, ∂²uᵩ/∂x², … So we can differentiate the total loss L(ᵩ) with respect to ᵩ and train with gradient descent. This is the whole idea behind PINNs. Learn a function, but make the PDE part of the loss, so the network is trained to be a solution, not just a curve-fitter. In the render, the main 3D surface is the network’s current guess uᵩ(x,t), drawn as a living sheet over the (x,t) plane. Hovering above is the neural scaffold...a visible graph of feature nodes and connections. The bright tension threads are the physics residual rᵩ(x,t): each thread tethers a collocation bead on the sheet up to the scaffold, and it thickens and brightens exactly where |rᵩ| is large (color encodes the sign). As training runs, those threads go slack across the domain not because we hid the error, but because the network has actually been pushed toward rᵩ(x,t) ≈ 0. #PINNs #PhysicsInformedNeuralNetworks #ScientificMachineLearning #PDE #DifferentialEquations #Optimization #MachineLearning #AppliedMath #ComputationalPhysics

Mathelirium

17,285 Aufrufe • vor 2 Monaten

Implemented a visualization for the neural network today. This is a neural network being trained to learn the snake game in a terminal. It's built in Rust using Ratatui More input details in thread

Implemented a visualization for the neural network today. This is a neural network being trained to learn the snake game in a terminal. It's built in Rust using Ratatui More input details in thread

Bones

165,655 Aufrufe • vor 2 Jahren

$AI's Secret Pattern: The Surprising Role of Fractals in Neural Networks In the realm of artificial intelligence (AI), a groundbreaking discovery has emerged, challenging our conventional understanding of neural network training and optimization. This revelation centers around the identification of fractal patterns at the boundary between trainable and untrainable neural network hyperparameters, presenting a series of profound implications and avenues for further research. Fractals, known for their intricate, self-similar patterns that recur at every scale, have long fascinated mathematicians and scientists alike. Typically associated with simple, one-dimensional iterative functions, the appearance of fractals within the complex, multivariate domain of neural network training introduces a striking contrast. The organic and asymmetric nature of these fractals, as derived from the training processes, suggests a deeper, unexplored connection between the mathematical properties of fractals and the functional dynamics of neural networks. The study’s focus on two-dimensional slices of hyperparameter space barely scratches the surface of the complexity inherent in neural networks, which are characterized by a vast array of hyperparameters. The existence of fractals in this context hints at an underlying high-dimensional structure, a concept that challenges our current capabilities and understanding. Extending fractal analysis to these higher dimensions represents a significant, yet exciting, challenge that could illuminate new aspects of neural network behavior and learning capabilities. An unexpected finding from the research is the persistence of clean fractal patterns even in the presence of stochastic elements introduced during minibatch training. This resilience suggests a parallel to Lyapunov fractals, where the iterative process involves randomly changing functions. This phenomenon prompts a reevaluation of how stochastic and deterministic processes influence fractal formation within neural networks, potentially offering new insights into the fundamental mechanisms of learning and adaptation. From a practical standpoint, the fractal nature of the boundary between trainable and untrainable hyperparameters has significant implications for the field of metalearning. The chaotic behavior of the meta-loss landscape, attributed to its extreme sensitivity, presents a formidable challenge for algorithms designed to optimize hyperparameters. Understanding the fractal characteristics of this landscape could provide valuable guidance for navigating its complexities, ultimately improving the efficiency and effectiveness of metalearning strategies. Beyond the technical and theoretical implications, the discovery also reveals an unexpected aesthetic dimension to neural network fractals. The visual beauty and meditative qualities of these patterns offer a unique opportunity to engage with the material in a deeply personal and contemplative manner. This aspect suggests potential psychological and physiological benefits from exposure to the intricate designs of neural network fractals, opening up novel intersections between technology, art, and well-being. In conclusion, the identification of fractal patterns within neural network hyperparameter spaces unveils a fascinating new frontier at the intersection of fractal geometry and deep learning. This discovery not only challenges existing paradigms but also opens up myriad possibilities for mathematical characterization, algorithmic development, and even subjective exploration. As researchers continue to delve into this rich vein of inquiry, the promise of uncovering new knowledge and advancing our understanding of neural networks and their training processes remains as compelling as ever.$

AI's Secret Pattern: The Surprising Role of Fractals in Neural Networks In the realm of artificial intelligence (AI), a groundbreaking discovery has emerged, challenging our conventional understanding of neural network training and optimization. This revelation centers around the identification of fractal patterns at the boundary between trainable and untrainable neural network hyperparameters, presenting a series of profound implications and avenues for further research. Fractals, known for their intricate, self-similar patterns that recur at every scale, have long fascinated mathematicians and scientists alike. Typically associated with simple, one-dimensional iterative functions, the appearance of fractals within the complex, multivariate domain of neural network training introduces a striking contrast. The organic and asymmetric nature of these fractals, as derived from the training processes, suggests a deeper, unexplored connection between the mathematical properties of fractals and the functional dynamics of neural networks. The study’s focus on two-dimensional slices of hyperparameter space barely scratches the surface of the complexity inherent in neural networks, which are characterized by a vast array of hyperparameters. The existence of fractals in this context hints at an underlying high-dimensional structure, a concept that challenges our current capabilities and understanding. Extending fractal analysis to these higher dimensions represents a significant, yet exciting, challenge that could illuminate new aspects of neural network behavior and learning capabilities. An unexpected finding from the research is the persistence of clean fractal patterns even in the presence of stochastic elements introduced during minibatch training. This resilience suggests a parallel to Lyapunov fractals, where the iterative process involves randomly changing functions. This phenomenon prompts a reevaluation of how stochastic and deterministic processes influence fractal formation within neural networks, potentially offering new insights into the fundamental mechanisms of learning and adaptation. From a practical standpoint, the fractal nature of the boundary between trainable and untrainable hyperparameters has significant implications for the field of metalearning. The chaotic behavior of the meta-loss landscape, attributed to its extreme sensitivity, presents a formidable challenge for algorithms designed to optimize hyperparameters. Understanding the fractal characteristics of this landscape could provide valuable guidance for navigating its complexities, ultimately improving the efficiency and effectiveness of metalearning strategies. Beyond the technical and theoretical implications, the discovery also reveals an unexpected aesthetic dimension to neural network fractals. The visual beauty and meditative qualities of these patterns offer a unique opportunity to engage with the material in a deeply personal and contemplative manner. This aspect suggests potential psychological and physiological benefits from exposure to the intricate designs of neural network fractals, opening up novel intersections between technology, art, and well-being. In conclusion, the identification of fractal patterns within neural network hyperparameter spaces unveils a fascinating new frontier at the intersection of fractal geometry and deep learning. This discovery not only challenges existing paradigms but also opens up myriad possibilities for mathematical characterization, algorithmic development, and even subjective exploration. As researchers continue to delve into this rich vein of inquiry, the promise of uncovering new knowledge and advancing our understanding of neural networks and their training processes remains as compelling as ever.

Carlos E. Perez

133,519 Aufrufe • vor 2 Jahren

What happens when you put competing neural networks in a Petri Dish and start changing the rules while they adapt? Last year we released Petri Dish NCA, where neural nets are the organisms that learn during simulation. Today we're releasing Digital Ecosystems: a browser-based platform for interactive artificial life research. The setup: several small CNNs share a 2D grid, each seeing only a 3x3 neighborhood. No global plan. They compete for territory by attacking neighbours and defending against incoming attacks, learning via gradient descent online while the simulation runs. What we didn't expect was the role of the learning itself. Gradient descent isn't just optimising each species' strategy. Instead, it acts to stabilize the whole system during simulation. Species that overextend get pushed back by the loss. Species that stagnate get nudged to grow. This means you can push parameters toward edge-of-chaos regimes: a zone characterised by emergent complexity. Letting the neural networks learn acts to hold the complex system together while you explore and interact. The platform lets you steer all of this interactively. You can draw walls to create niches, erase parts of the system online, and tune 40+ system parameters to explore the most interesting configurations. We find it mesmerizing to watch species carve out territories and reorganise when you perturb them. Everything runs client-side in your browser, no install needed. Blog: Code:

What happens when you put competing neural networks in a Petri Dish and start changing the rules while they adapt? Last year we released Petri Dish NCA, where neural nets are the organisms that learn during simulation. Today we're releasing Digital Ecosystems: a browser-based platform for interactive artificial life research. The setup: several small CNNs share a 2D grid, each seeing only a 3x3 neighborhood. No global plan. They compete for territory by attacking neighbours and defending against incoming attacks, learning via gradient descent online while the simulation runs. What we didn't expect was the role of the learning itself. Gradient descent isn't just optimising each species' strategy. Instead, it acts to stabilize the whole system during simulation. Species that overextend get pushed back by the loss. Species that stagnate get nudged to grow. This means you can push parameters toward edge-of-chaos regimes: a zone characterised by emergent complexity. Letting the neural networks learn acts to hold the complex system together while you explore and interact. The platform lets you steer all of this interactively. You can draw walls to create niches, erase parts of the system online, and tune 40+ system parameters to explore the most interesting configurations. We find it mesmerizing to watch species carve out territories and reorganise when you perturb them. Everything runs client-side in your browser, no install needed. Blog: Code:

Sakana AI

257,877 Aufrufe • vor 3 Monaten

What are Physics-Informed Neural Networks (PINNs) Physics-Informed Neural Networks (PINNs) are neural nets trained to satisfy a differential equation. The trick is simple. You bake the PDE residual straight into the loss. They came out of a very practical pain point. Classical PDE pipelines can be amazing, but they often demand a lot of setup work. Meshes. Stencils. Stability tuning. And once you build a solver, it’s usually tied to one geometry and one discretization choice. A PINN flips the workflow. You represent the solution itself as a smooth function uᵩ(x,t) and you enforce the physics wherever you choose to sample the domain. Most people first meet PINNs in the least helpful way. A pretty solution surface, almost no clarity on what was enforced to make it appear. In this series we keep the enforcement visible. We pick a PDE, represent the unknown solution as a flexible function, measure how badly that function violates the equation across the domain, and train it to reduce that mismatch at the points we sample. A normal neural net learns from labels. You give it inputs and target outputs. A PINN learns from an equation. You give it inputs (x,t), and it gets penalized whenever its output fails the PDE. Smaller mismatch means smaller loss. Bigger mismatch means bigger loss. That’s all “punish” and “reward” mean here. The network isn’t replacing physics. It’s just a flexible function that we force to obey the same calculus you’d demand from any candidate solution. The math breakdown: We start with a PDE on a domain Ω. Write it as uₜ(x,t) + N(u(x,t), uₓ(x,t), uₓₓ(x,t), …) = 0 for (x,t) in Ω A PINN replaces the unknown u with a neural network output uᵩ(x,t) Now define the physics residual by plugging uᵩ into the PDE rᵩ(x,t) = ∂uᵩ/∂t + N(uᵩ, ∂uᵩ/∂x, ∂²uᵩ/∂x², …) If uᵩ were an exact solution, we’d have rᵩ(x,t) = 0 everywhere. We may also have data points (xᵢ,tᵢ,uᵢ) from measurements or from an initial condition. The training objective is a weighted sum of squared errors L(ᵩ) = L_data(ᵩ) + λ L_phys(ᵩ) + L_bc/ic(ᵩ) with L_data(ᵩ) = meanᵢ |uᵩ(xᵢ,tᵢ) − uᵢ|² L_phys(ᵩ) = meanⱼ |rᵩ(xⱼ,tⱼ)|² where (xⱼ,tⱼ) are collocation points in Ω L_bc/ic(ᵩ) = penalties enforcing boundary conditions and initial conditions The key technical step is how we get the derivatives inside rᵩ. We don’t approximate them with finite differences. We compute them with automatic differentiation: ∂uᵩ/∂t, ∂uᵩ/∂x, ∂²uᵩ/∂x², … Then we differentiate the total loss L(ᵩ) with respect to ᵩ and train with gradient descent. That’s the whole idea. Learn a function, but make the PDE part of the loss, so the network is trained to be a solution, not just a curve-fitter. In the render, the main 3D surface is the network’s current guess uᵩ(x,t), drawn as a living sheet over the (x,t) plane. Hovering above is the neural scaffold, a visible graph of feature nodes and connections. The bright tension threads are the physics residual rᵩ(x,t). Each thread tethers a collocation bead on the sheet up to the scaffold, and it thickens and brightens exactly where |rᵩ| is large, with color showing the sign. As training runs, those threads go slack across the domain, not because we hid the error, but because the network has actually been pushed toward rᵩ(x,t) ≈ 0. #PINNs #ScientificMachineLearning #PDE #DifferentialEquations #Optimization #MachineLearning #AppliedMath #ComputationalPhysics

What are Physics-Informed Neural Networks (PINNs) Physics-Informed Neural Networks (PINNs) are neural nets trained to satisfy a differential equation. The trick is simple. You bake the PDE residual straight into the loss. They came out of a very practical pain point. Classical PDE pipelines can be amazing, but they often demand a lot of setup work. Meshes. Stencils. Stability tuning. And once you build a solver, it’s usually tied to one geometry and one discretization choice. A PINN flips the workflow. You represent the solution itself as a smooth function uᵩ(x,t) and you enforce the physics wherever you choose to sample the domain. Most people first meet PINNs in the least helpful way. A pretty solution surface, almost no clarity on what was enforced to make it appear. In this series we keep the enforcement visible. We pick a PDE, represent the unknown solution as a flexible function, measure how badly that function violates the equation across the domain, and train it to reduce that mismatch at the points we sample. A normal neural net learns from labels. You give it inputs and target outputs. A PINN learns from an equation. You give it inputs (x,t), and it gets penalized whenever its output fails the PDE. Smaller mismatch means smaller loss. Bigger mismatch means bigger loss. That’s all “punish” and “reward” mean here. The network isn’t replacing physics. It’s just a flexible function that we force to obey the same calculus you’d demand from any candidate solution. The math breakdown: We start with a PDE on a domain Ω. Write it as uₜ(x,t) + N(u(x,t), uₓ(x,t), uₓₓ(x,t), …) = 0 for (x,t) in Ω A PINN replaces the unknown u with a neural network output uᵩ(x,t) Now define the physics residual by plugging uᵩ into the PDE rᵩ(x,t) = ∂uᵩ/∂t + N(uᵩ, ∂uᵩ/∂x, ∂²uᵩ/∂x², …) If uᵩ were an exact solution, we’d have rᵩ(x,t) = 0 everywhere. We may also have data points (xᵢ,tᵢ,uᵢ) from measurements or from an initial condition. The training objective is a weighted sum of squared errors L(ᵩ) = L_data(ᵩ) + λ L_phys(ᵩ) + L_bc/ic(ᵩ) with L_data(ᵩ) = meanᵢ |uᵩ(xᵢ,tᵢ) − uᵢ|² L_phys(ᵩ) = meanⱼ |rᵩ(xⱼ,tⱼ)|² where (xⱼ,tⱼ) are collocation points in Ω L_bc/ic(ᵩ) = penalties enforcing boundary conditions and initial conditions The key technical step is how we get the derivatives inside rᵩ. We don’t approximate them with finite differences. We compute them with automatic differentiation: ∂uᵩ/∂t, ∂uᵩ/∂x, ∂²uᵩ/∂x², … Then we differentiate the total loss L(ᵩ) with respect to ᵩ and train with gradient descent. That’s the whole idea. Learn a function, but make the PDE part of the loss, so the network is trained to be a solution, not just a curve-fitter. In the render, the main 3D surface is the network’s current guess uᵩ(x,t), drawn as a living sheet over the (x,t) plane. Hovering above is the neural scaffold, a visible graph of feature nodes and connections. The bright tension threads are the physics residual rᵩ(x,t). Each thread tethers a collocation bead on the sheet up to the scaffold, and it thickens and brightens exactly where |rᵩ| is large, with color showing the sign. As training runs, those threads go slack across the domain, not because we hid the error, but because the network has actually been pushed toward rᵩ(x,t) ≈ 0. #PINNs #ScientificMachineLearning #PDE #DifferentialEquations #Optimization #MachineLearning #AppliedMath #ComputationalPhysics

Mathelirium

44,712 Aufrufe • vor 5 Monaten

Holy shit… someone just made machine learning click. Not static diagrams. Not math-heavy PDFs. Not black-box training. Real algorithms — training step-by-step — visually. It’s called Machine Learning Visualized and it lets you watch models learn in real time. Here’s why this is different: Instead of dumping theory first, it shows optimization happening live: • gradients moving • weights updating • decision boundaries shifting • loss decreasing • models converging You literally see learning happen. Everything is built from first principles: • Gradient Descent • Logistic Regression • Perceptron • PCA • K-Means • Neural Networks • Backpropagation No magic. Just math → code → visualization. Each chapter is a Jupyter notebook that derives the math then implements it then animates training. So you can watch: • neural nets shape decision surfaces • PCA rotate feature space • K-means clusters form live • gradient descent find minima • sigmoid reshape boundaries • backprop update weights step-by-step This solves a huge problem: Most ML resources teach: math → code → ??? → trained model This shows: math → code → learning process → result Which means you finally understand: • why gradients matter • how weights evolve • what loss landscapes look like • how convergence actually happens • why deep nets learn non-linear functions Even better: You can open any notebook modify parameters and watch behavior change instantly. Learning ML becomes interactive. Not passive. Not abstract. Not confusing. Just… visible. Perfect for: • beginners learning ML • devs moving into AI • interview prep • teaching concepts • understanding backprop • visual learners • building intuition This is the kind of resource that makes neural networks finally “click”. Link: We’re moving from: reading about ML → watching ML learn That’s a big shift. Because once you can see training, you stop memorizing… and start understanding. AI education just got visual.

Holy shit… someone just made machine learning click. Not static diagrams. Not math-heavy PDFs. Not black-box training. Real algorithms — training step-by-step — visually. It’s called Machine Learning Visualized and it lets you watch models learn in real time. Here’s why this is different: Instead of dumping theory first, it shows optimization happening live: • gradients moving • weights updating • decision boundaries shifting • loss decreasing • models converging You literally see learning happen. Everything is built from first principles: • Gradient Descent • Logistic Regression • Perceptron • PCA • K-Means • Neural Networks • Backpropagation No magic. Just math → code → visualization. Each chapter is a Jupyter notebook that derives the math then implements it then animates training. So you can watch: • neural nets shape decision surfaces • PCA rotate feature space • K-means clusters form live • gradient descent find minima • sigmoid reshape boundaries • backprop update weights step-by-step This solves a huge problem: Most ML resources teach: math → code → ??? → trained model This shows: math → code → learning process → result Which means you finally understand: • why gradients matter • how weights evolve • what loss landscapes look like • how convergence actually happens • why deep nets learn non-linear functions Even better: You can open any notebook modify parameters and watch behavior change instantly. Learning ML becomes interactive. Not passive. Not abstract. Not confusing. Just… visible. Perfect for: • beginners learning ML • devs moving into AI • interview prep • teaching concepts • understanding backprop • visual learners • building intuition This is the kind of resource that makes neural networks finally “click”. Link: We’re moving from: reading about ML → watching ML learn That’s a big shift. Because once you can see training, you stop memorizing… and start understanding. AI education just got visual.

Suryansh Tiwari

132,415 Aufrufe • vor 3 Monaten