Loading video...

Video Failed to Load

Go Home

Lecture 1 on Physics-Informed Neural Networks: A Mini-Series Physics-Informed Neural Networks (PINNs) are neural networks trained to satisfy a differential equation by building the PDE residual directly into the loss. They emerged from a very practical problem...classical PDE pipelines can be brilliant, but they often demand heavy discretization work...

47,308 views • 5 months ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

What if Your Neural Network Was Forced to Obey Physics? Physics-Informed Neural Networks (PINNs) are neural networks trained to satisfy a differential equation by building the PDE residual directly into the loss. They emerged from a very practical problem...classical PDE pipelines can be brilliant, but they often demand heavy discretization work (meshes, stencils, stability tuning), and the method you build is usually tied to one geometry and one solver setup. A PINN flips the workflow by representing the solution itself as a smooth function uᵩ(x,t) and enforcing the physics everywhere you choose to sample the domain. People often meet PINNs in the least helpful way...via a flashy solution plot, and almost no explanation of what was enforced to get it. In this series we keep the enforcement visible. We pick a differential equation, represent the unknown solution as a flexible function, measure how well that function satisfies the equation across the domain, and train it to reduce that mismatch everywhere we sample. A normal neural net learns from labels...you give it inputs and target outputs. A PINN learns from a differential equation...you give it inputs (x,t) and it gets punished whenever its output fails the PDE. By punish we mean that the loss increases when the mismatch is large we reward it if the loss decreases as the mismatch gets smaller. The network isn’t replacing physics, it’s becoming a flexible function that is forced to satisfy the same calculus you’d impose on any candidate solution. The math breakdown: We start with a PDE we want to solve on a domain Ω. Write it as uₜ(x,t) + N(u(x,t), uₓ(x,t), uₓₓ(x,t), …) = 0 for (x,t) in Ω A PINN replaces the unknown function u with a neural network output uᵩ(x,t) Now define the physics residual by plugging uᵩ into the PDE rᵩ(x,t) = ∂uᵩ/∂t + N(uᵩ, ∂uᵩ/∂x, ∂²uᵩ/∂x², …) If uᵩ were an exact solution, we would have rᵩ(x,t) = 0 everywhere. We may also have data points (xᵢ,tᵢ,uᵢ) from measurements or a known initial condition. The training objective is just a weighted sum of squared errors L(ᵩ) = L_data(ᵩ) + λ L_phys(ᵩ) + L_bc/ic(ᵩ) with L_data(ᵩ) = meanᵢ |uᵩ(xᵢ,tᵢ) − uᵢ|² L_phys(ᵩ) = meanⱼ |rᵩ(xⱼ,tⱼ)|² where (xⱼ,tⱼ) are the collocation points in Ω L_bc/ic(ᵩ) = penalties enforcing boundary conditions and initial conditions The key technical step is that the derivatives inside rᵩ are computed by automatic differentiation ∂uᵩ/∂t, ∂uᵩ/∂x, ∂²uᵩ/∂x², … So we can differentiate the total loss L(ᵩ) with respect to ᵩ and train with gradient descent. This is the whole idea behind PINNs. Learn a function, but make the PDE part of the loss, so the network is trained to be a solution, not just a curve-fitter. In the render, the main 3D surface is the network’s current guess uᵩ(x,t), drawn as a living sheet over the (x,t) plane. Hovering above is the neural scaffold...a visible graph of feature nodes and connections. The bright tension threads are the physics residual rᵩ(x,t): each thread tethers a collocation bead on the sheet up to the scaffold, and it thickens and brightens exactly where |rᵩ| is large (color encodes the sign). As training runs, those threads go slack across the domain not because we hid the error, but because the network has actually been pushed toward rᵩ(x,t) ≈ 0. #PINNs #PhysicsInformedNeuralNetworks #ScientificMachineLearning #PDE #DifferentialEquations #Optimization #MachineLearning #AppliedMath #ComputationalPhysics

Mathelirium

17,285 views • 1 month ago

Why Does Quantum Mechanics Use a Complex Wavefunction? Schrödinger’s equation doesn’t start from mystery. It starts from a very specific bet. The state of a particle is a complex field ψ(x,t), and whatever time-evolution rule we choose has to move ψ forward while preserving total probability. So the basic question is simple. What equation should ψ satisfy so that |ψ|² behaves like a conserved density, the way mass density does in fluid flow? What is ψ? Think of ψ(x,t) as an amplitude attached to the statement the particle is at position x at time t. It’s not a probability. It’s the thing you add first, and only at the end do you square it: p(x,t) = |ψ(x,t)|² Because ψ is complex, it has magnitude and phase. Write it as ψ(x,t) = r(x,t) exp(i θ(x,t)) Then r² = |ψ|² is the density, and the phase θ ends up controlling the flow through the probability current. Where does Schrödinger’s equation come from? Start with two empirical inputs that tie waves to particles: E = ħ ω p = ħ k Here ħ is Planck’s constant divided by 2π. It’s the conversion factor between frequency and energy, and between wavenumber and momentum. A plane wave with angular frequency ω and wavevector k is ψ(x,t) = A exp(i(k·x − ωt)) Now watch what derivatives do to this wave: ∂ψ/∂t = −i ω ψ ∇ψ = i k ψ ∇²ψ = −|k|² ψ Multiply by ħ and you get: i ħ ∂ψ/∂t = ħ ω ψ = E ψ −i ħ ∇ψ = ħ k ψ = p ψ −ħ² ∇²ψ = ħ² |k|² ψ = p² ψ So for plane waves, the operators Ê = i ħ ∂/∂t p̂ = −i ħ ∇ act like energy and momentum. Now bring in the classical, nonrelativistic energy bookkeeping: E = p²/(2m) + V(x) Kinetic plus potential. That’s it. Turn it into an equation for ψ by replacing E and p with the operators above: Ê ψ = (p̂²/(2m) + V) ψ Since p̂² = (−i ħ ∇)·(−i ħ ∇) = −ħ² ∇², this becomes i ħ ∂ψ/∂t = ( −ħ²/(2m) ∇² + V(x) ) ψ That’s the time-dependent Schrödinger equation. This derivation is a controlled heuristic. Match the plane-wave identities to the measured relations E = ħω and p = ħk, then impose the same energy bookkeeping you trust in classical mechanics. Why this is the right kind of rule If ψ is the state, we need a rule that preserves total probability: ∫ |ψ(x,t)|² dx = 1 Schrödinger evolution does, and you can see it by deriving a continuity equation. Let ρ(x,t) = |ψ|² = ψ*ψ. Differentiate: ∂ρ/∂t = ψ* ∂ψ/∂t + ψ ∂ψ*/∂t Use Schrödinger and its complex conjugate. The potential terms cancel, and what’s left can be rearranged into ∂ρ/∂t + ∇·j = 0 with probability current j = (ħ/(2mi)) ( ψ* ∇ψ − ψ ∇ψ* ) That’s the cleanest way to say what ψ is. |ψ|² behaves like a conserved density, the phase drives a current, and the time evolution is fixed, up to V, by combining wave relations with energy bookkeeping: i ħ ∂ψ/∂t = ( −ħ²/(2m) ∇² + V ) ψ #QuantumMechanics #SchrodingerEquation #WaveFunction #BornRule #Physics #MathematicalPhysics

Mathelirium

20,781 views • 4 months ago

Lecture 2 on our Quantum Mechanics Series Schrödinger’s equation doesn’t start from mystery. It starts from a very specific bet…the state of a particle is a complex field ψ(x,t), and whatever dynamics we write down must move ψ forward in time in a way that preserves total probability. We ask a basic question…what equation should ψ satisfy so that |ψ|² behaves like a conserved density, the way mass density does in fluid flow? What is ψ? Think of ψ(x,t) as the amplitude assigned to “the particle is at position x at time t”. It’s not a probability. It’s the object you add first, and only at the end do you square p(x,t) = |ψ(x,t)|² Because ψ is complex, it has magnitude and phase. Write it in polar form ψ(x,t) = r(x,t) exp(i θ(x,t)) Then r² = |ψ|² is the density, and θ will end up controlling flow (the probability current). Where does Schrödinger’s equation come from? Start with two empirical inputs about waves and particles: E = ħ ω p = ħ k Here ħ (“h-bar”) is Planck’s constant divided by 2π. It’s the unit conversion factor between the wave description (frequency ω, wavevector k) and the particle description (energy E, momentum p). In units, ħ has units of joule-seconds, so multiplying ω (1/seconds) gives energy (joules), and multiplying k (1/meters) gives momentum (kg·m/s). It’s the number that tells you how much energy or momentum you get per unit frequency or wavenumber. A plane wave with angular frequency ω and wavevector k is ψ(x,t) = A exp(i(k·x − ω t)) Now notice what derivatives do to this wave: ∂ψ/∂t = −i ω ψ ∇ψ = i k ψ ∇²ψ = −|k|² ψ Multiply those identities by ħ: i ħ ∂ψ/∂t = ħ ω ψ = E ψ −i ħ ∇ψ = ħ k ψ = p ψ −ħ² ∇²ψ = ħ² |k|² ψ = p² ψ So for plane waves, the operators Ê = i ħ ∂/∂t p̂ = −i ħ ∇ act like energy and momentum! Now use the classical nonrelativistic energy relation: E = p²/(2m) + V(x) This is bookkeeping for a particle moving slow enough that relativity can be ignored. The term p²/(2m) is kinetic energy. If p = mv, then p²/(2m) = (m²v²)/(2m) = (1/2)mv². The term V(x) is potential energy. It depends on position because forces come from spatially varying energy. A slope in V pushes the particle. Examples: for a charged particle in an electric potential φ(x), V(x) = q φ(x). Near Earth, V(z) = mgz. The point is total energy equals kinetic plus potential. Turn that into an equation for ψ by replacing E and p with the operators above: Ê ψ = (p̂²/(2m) + V) ψ Compute p̂² = (−i ħ ∇)·(−i ħ ∇) = −ħ² ∇², so we get i ħ ∂ψ/∂t = ( −ħ²/(2m) ∇² + V(x) ) ψ That is the time-dependent Schrödinger equation. The derivation here is a controlled heuristic: we matched the plane-wave identities to the measured relations E = ħω and p = ħk, then imposed the same energy bookkeeping as classical mechanics. Why this equation is the right kind of rule If ψ is the state, we need a rule that preserves total probability: ∫ |ψ(x,t)|² dx = 1 Schrödinger evolution does. You can see it by deriving a continuity equation. Let ρ(x,t) = |ψ|² = ψ* ψ. Take a time derivative: ∂ρ/∂t = ψ* ∂ψ/∂t + ψ ∂ψ*/∂t Use Schrödinger and its complex conjugate: ∂ψ/∂t = (1/(i ħ)) ( −ħ²/(2m) ∇²ψ + Vψ ) ∂ψ*/∂t = (−1/(i ħ)) ( −ħ²/(2m) ∇²ψ* + Vψ* ) Plug in. The V terms cancel exactly, and what remains can be rearranged into a divergence: ∂ρ/∂t + ∇·j = 0 where the probability current is j = (ħ/(2mi)) ( ψ* ∇ψ − ψ ∇ψ* ) This is the best way to explain ehat ψ is: |ψ|² behaves like a conserved density, and the phase of ψ is what drives the current j. So in this series, ψ isn’t a slogan. It’s the object whose modulus squared is the density, whose phase generates flow, and whose time evolution is fixed (up to V) by matching wave relations to energy bookkeeping: i ħ ∂ψ/∂t = ( -ħ²/(2m) ∇² + V ) ψ #QuantumMechanics #SchrodingerEquation #WaveFunction #BornRule #Physics #MathematicalPhysics

Mathelirium

40,835 views • 6 months ago

Ask anyone who’s taken a course in Ordinary Differential Equations (ODEs) what a solution to an ODE represents geometrically, and most of them won’t have a clean answer. When I first took ordinary differential equations, the pattern was always the same. Early on it turns into a speedrun of methods: separation of variables, integrating factors, variation of parameters, Bernoulli, exact equations. Then pretty quickly the course slides into hammer-picking. Spot the form, apply the recipe, move on. Too mechanical! And the real problem is what you don’t walk away with. You leave with a toolkit, but without a feel for what a differential equation even is, especially geometrically. That matters because in real modeling the equations you meet are rarely nice enough to reward memorised recipes. So you get trained to solve toy forms, while the actual subject stays blurry. The behavior. The flow. The shape of solutions. It wasn't until I watched the first lecture of Professor Arthur Mattuck that I realized I didn’t actually know what a solution to a differential equation represents geometrically. His point is almost embarrassingly simple. A first-order ODE is a slope field, and a solution is a curve that stays tangent to that field everywhere. The math breakdown: Write the ODE as dy/dx = f(x,y). At each point (x,y), attach a tiny line segment with slope f(x,y). A function y = y₁(x) is a solution exactly when its graph follows those slopes. At every x, the slope of the curve equals the slope prescribed by the field at the point on the curve. That’s the one line that ties both viewpoints together: y₁′(x) = f(x, y₁(x)). So solving the ODE and drawing an integral curve are the same statement in two languages. Once you see that, you stop obsessing over whether you can write y(x) in closed form. You start asking the questions that actually matter. Where do solutions flow. Where do they get trapped. Where do they blow up. Where does existence or uniqueness fail because the field isn’t even defined? That’s the perspective shift I wish every ODE course forces early. It’s also why I keep pairing math with animation. #DifferentialEquations #ODEs #VectorFields #AppliedMathematics #Mathematics #

Mathelirium

40,739 views • 4 months ago

The Trap in Every Mathematics Lecture If you’ve taken a lot of math courses, you start to recognize a pattern. There’s a moment where the lecturer is warming up with the obvious stuff...add matrices entrywise, scale by α, do the row-column product...and you’re thinking, alright… where is this going? Then you relax. You stop resisting. And right there, they slip in one line that changes how you see the whole subject. When Benedict Gross says "matrices represent linear operators,"he’s telling you to stop treating a matrix as a rectangle of numbers and start treating it as an action. A linear operator is a function T: Rⁿ → Rⁿ that respects two rules: T(u+v)=T(u)+T(v) and T(αu)=αT(u). Once you pick a basis, T is completely determined by where it sends the basis vectors e₁,…,eₙ. Put T(e₁),…,T(eₙ) into columns and you get a matrix A. That is what "A represents T" means...A is the coordinate portrait of the transformation. Now the punchline that makes matrix multiplication feel inevitable. If B represents S and A represents T, then doing S first and then T is the composition T∘S. In coordinates that becomes A(Bx)=(AB)x. So multiplying matrices is really composing transformations. That’s why multiplication is usually not commutative: T∘S is generally not the same transformation as S∘T, and the matrices inherit that noncommutativity. This explains half of Linear Algebra because it tells you what the course is really about...functions that move vectors around, not grids of numbers. A matrix is just the written form of that function once you choose coordinates. Then the rules stop feeling random Multiplying matrices means doing one move and then another, an inverse means you can undo the move, eigenvectors are directions that don’t get turned, and changing basis is just describing the same move in a different language. That one idea makes a lot of linear algebra click. #LinearAlgebra #Matrices #GroupTheory #GLn #MathLectures #Mathematics

Mathelirium

66,204 views • 5 months ago

Lecture 3 of our Quantum Mechanics series. Lecture 2 gave us the one clean privilege quantum theory offers: treat ψ(x,t) as the state and ρ(x,t) = |ψ(x,t)|² as probability, because Schrödinger evolution forces ρ to obey a continuity equation. Lecture 3 is what that continuity equation is really telling you. If ρ behaves like a fluid, then the only question that matters is: What is the velocity field? Write ψ(x,t) = r(x,t) exp(i θ(x,t)). The magnitude r sets how much probability is sitting there. The phase θ sets where it tries to go. When you unpack the current j = Im(ψ* ∇ψ), it collapses to j = (ρ/m) ∇θ, which means the flow lines you draw are literally contours of phase geometry. Then the constraint that makes the picture bite: ψ has to be single-valued, so θ can’t wind by an arbitrary amount. Around any closed loop the total phase change must be 2π n, with n an integer. That’s why vortices aren’t features you add...they’re defects the math permits, in quantized units. In the render you see both layers at once...the 3D surface shows |ψ| breathing while the phase skin slides, and the 2D panel exposes the engine...current lines steering around discrete vortex charges. The math breakdown We write the state as a complex field ψ(x,t) on the plane (x in R²). The Born rule defines the probability density ρ(x,t) = |ψ(x,t)|² Schrödinger evolution (ħ = 1 units) is i ∂ψ/∂t = [ −(1/2m) ∇² + V(x,t) ] ψ Now derive conservation of probability. Start with ρ = ψ*ψ: ∂ρ/∂t = ψ* (∂ψ/∂t) + ψ (∂ψ*/∂t) Use Schrödinger and its complex conjugate: ∂ψ/∂t = (1/i) [ −(1/2m) ∇²ψ + Vψ ] ∂ψ*/∂t = (−1/i) [ −(1/2m) ∇²ψ* + Vψ* ] Substitute. The V terms cancel, and the remaining terms rearrange into the continuity equation ∂ρ/∂t + ∇·j = 0 with probability current j = (1/2mi) ( ψ* ∇ψ − ψ ∇ψ* ) = (1/m) Im(ψ* ∇ψ) So "probability density" really behaves like a conserved fluid density with flux j. Now expose the phase mechanism. Write ψ in polar form ψ(x,t) = r(x,t) exp(i θ(x,t)) Compute the gradient ∇ψ = exp(iθ) (∇r + i r ∇θ) Then ψ* ∇ψ = r (∇r + i r ∇θ) Taking the imaginary part gives Im(ψ* ∇ψ) = r² ∇θ = ρ ∇θ So the current becomes j = (ρ/m) ∇θ That’s the steering-wheel statement: Phase gradient sets the flow direction and speed (modulated by density and m). Finally, quantized vortices. Because ψ must be single-valued, going around any closed loop must return the same complex value. That forces the phase winding to be an integer multiple of 2π: ∮ ∇θ · dl = 2π n with n in Z n is the vortex charge. Vortex cores sit where ρ ≈ 0 (phase is undefined), and the current streamlines circulate around them. #QuantumMechanics #Wavefunction #SchrodingerEquation #BornRule #ProbabilityCurrent #ContinuityEquation #Phase #Vortices #TopologicalDefects #ComplexAnalysis #MathematicalPhysics #Mathematics #Physics

Mathelirium

37,998 views • 6 months ago

The Trap in Every Mathematics Lecture If you’ve taken enough math courses, you start noticing the same little move. The lecturer warms up with the obvious stuff, add matrices entrywise, scale by α, do the row-column product, and you’re thinking alright, where is this going. Then you relax. You stop resisting. And right there, they drop one line that quietly rewires the whole subject. When Benedict Gross says matrices represent linear operators, he’s telling you to stop treating a matrix as a rectangle of numbers and start treating it as an action. A linear operator is a function T: ℝⁿ → ℝⁿ that respects two rules: T(u+v) = T(u) + T(v) T(αu) = αT(u) Once you pick a basis, T is completely determined by where it sends the basis vectors e₁,…,eₙ. Put T(e₁),…,T(eₙ) into columns and you get a matrix A. That is what A represents T means. A is the coordinate portrait of the transformation. Now the punchline that makes matrix multiplication feel inevitable. If B represents S and A represents T, then doing S first and then T is the composition T∘S. In coordinates that becomes A(Bx) = (AB)x. So multiplying matrices is really composing transformations. That’s why multiplication is usually not commutative. T∘S is generally not the same transformation as S∘T, and the matrices inherit that noncommutativity. This explains half of linear algebra because it tells you what the course is really about: functions that move vectors around, not grids of numbers. A matrix is just the written form of that function once you choose coordinates. After that, the rules stop feeling random. Multiplying matrices means doing one move and then another. An inverse means you can undo the move. Eigenvectors are directions that don’t get turned. Changing basis is just describing the same move in a different language. One idea, and a lot of linear algebra suddenly clicks. #LinearAlgebra #Matrices #LinearMaps #Eigenvectors #ChangeOfBasis #Mathematics

Mathelirium

133,454 views • 4 months ago

Do you actually know what convex optimization is in the geometric, guarantee-theoretic sense or have you only met it through solvers and loss curves? Convexity is rare comfort in optimization...there are no spurious local minima, no surprise traps, and inequalities you can use like tools instead of prayers. So, what is this convexity? Let x = (x₁, x₂) and let f(x) be convex. Plot the surface z = f(x). Pick a contact point x₀. The local slope is the gradient p = ∇f(x₀). That p is exactly the data that defines the supporting plane: z = f(x₀) + p · (x − x₀). Thus, f is said to be convex because for every x, f(x) ≥ f(x₀) + p · (x − x₀). So the plane at x₀ can slide under the surface, but it never slices through it. Not near the point...everywhere. Now for here is the interesting part: The slope becomes a coordinate system! Rewrite the same plane as z = p · x − b, where b is the offset. Because the plane passes through (x₀, f(x₀)), the offset is forced to be b = p · x₀ − f(x₀). And that number isn’t just geometry trivia. It’s the convex conjugate: f*(p) = sup over x ( p · x − f(x) ). At a differentiable contact point, the supporting plane touches f tightly enough that the supremum is achieved at x₀, giving the identity f*(p) = p · x₀ − f(x₀) when p = ∇f(x₀). So one moving contact point gives two linked readouts: primal position x₀ dual position (slope) p = ∇f(x₀) dual offset f*(p) One surface. Two worlds. #ConvexOptimization #Optimization #MachineLearning #SignalProcessing #AppliedMath #Engineering

Mathelirium

38,506 views • 6 months ago

When I first took ordinary differential equations, the pattern was always the same. Week 1 turns into a speedrun of methods: separation of variables, integrating factors, variation of parameters, Bernoulli, exact equations… and by Week 2 or 3 the course has quietly degenerated into hammer-picking. Spot the form, apply the recipe, move on. Mechanical! Fuuuuck!😫😫😫😫 The problem is what you don’t walk away with. You leave with a toolkit, but without a feel for what a differential equation even is, especially geometrically. And that’s a big deal, because in real modeling the equations you meet are rarely nice enough to reward memorized recipes. So you end up trained to solve toy forms, while the actual subject...the behavior, the flow, the shape of solutions stays blurry. This is why I’m biased toward the old-timers. Their old-school way of doing things always surprises me:...they’ll spend time on one idea until it sticks, instead of sprinting through a syllabus checklist. One lecture from them and you start noticing a contrast. A lot of modern teaching feels like "finish the content,". You get marched through techniques, but you’re not left with a single thought that keeps bothering you later...the kind of thought that actually pushes you toward research-level curiosity. MIT OpenCourseWare’s Professor Arthur Mattuck did that to me in his very first ODE lecture. One lecture, and your whole relationship with dy/dx = f(x,y) changes. In this segment, Prof. Mattuck is basically saying: A first-order ODE is a slope field, and a solution is a curve that moves everywhere tangent to that field. The math breakdown Write the ODE as dy/dx = f(x,y). At each point (x,y) you attach a tiny line segment with slope f(x,y). A function y = y₁(x) is a solution exactly when its graph follows those slopes:. At every x, the slope of the curve equals the slope prescribed by the field at the point on the curve. That’s the single line that unifies both viewpoints: y₁′(x) = f(x, y₁(x)). So solving the ODE and drawing an integral curve are the same statement in two languages!👌🏻 Once you see that, you can stop obsessing over whether you can write y(x) in closed form. You can start asking the questions that matter: where do solutions flow, where do they get trapped, where do they blow up, and where does existence/uniqueness fail just because the field isn’t even defined? That’s the perspective shift I wish every ODE course forces early and it’s exactly why I keep pairing math with animation. #DifferentialEquations #ODEs #VectorFields #MathAnimation #Mathematics

Mathelirium

53,338 views • 5 months ago