Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Lecture 1 on Physics-Informed Neural Networks: A Mini-Series Physics-Informed Neural Networks (PINNs) are neural networks trained to satisfy a differential equation by building the PDE residual directly into the loss. They emerged from a very practical problem...classical PDE pipelines can be brilliant, but they often demand heavy discretization work... (meshes, stencils, stability tuning), and the method you build is usually tied to one geometry and one solver setup. A PINN flips the workflow by representing the solution itself as a smooth function uᵩ(x,t) and enforcing the physics everywhere you choose to sample the domain. People often meet PINNs in the least helpful way...via a flashy solution plot, and almost no explanation of what was enforced to get it. In this series we keep the enforcement visible. We pick a differential equation, represent the unknown solution as a flexible function, measure how well that function satisfies the equation across the domain, and train it to reduce that mismatch everywhere we sample. A normal neural net learns from labels...you give it inputs and target outputs. A PINN learns from a differential equation...you give it inputs (x,t) and it gets punished whenever its output fails the PDE. By punish we mean that the loss increases when the mismatch is large we reward it if the loss decreases as the mismatch gets smaller. The network isn’t replacing physics, it’s becoming a flexible function that is forced to satisfy the same calculus you’d impose on any candidate solution. The math breakdown: We start with a PDE we want to solve on a domain Ω. Write it as uₜ(x,t) + N(u(x,t), uₓ(x,t), uₓₓ(x,t), …) = 0 for (x,t) in Ω A PINN replaces the unknown function u with a neural network output uᵩ(x,t) Now define the physics residual by plugging uᵩ into the PDE rᵩ(x,t) = ∂uᵩ/∂t + N(uᵩ, ∂uᵩ/∂x, ∂²uᵩ/∂x², …) If uᵩ were an exact solution, we would have rᵩ(x,t) = 0 everywhere. We may also have data points (xᵢ,tᵢ,uᵢ) from measurements or a known initial condition. The training objective is just a weighted sum of squared errors L(ᵩ) = L_data(ᵩ) + λ L_phys(ᵩ) + L_bc/ic(ᵩ) with L_data(ᵩ) = meanᵢ |uᵩ(xᵢ,tᵢ) − uᵢ|² L_phys(ᵩ) = meanⱼ |rᵩ(xⱼ,tⱼ)|² where (xⱼ,tⱼ) are the collocation points in Ω L_bc/ic(ᵩ) = penalties enforcing boundary conditions and initial conditions The key technical step is that the derivatives inside rᵩ are computed by automatic differentiation ∂uᵩ/∂t, ∂uᵩ/∂x, ∂²uᵩ/∂x², … So we can differentiate the total loss L(ᵩ) with respect to ᵩ and train with gradient descent. This is the whole idea behind PINNs. Learn a function, but make the PDE part of the loss, so the network is trained to be a solution, not just a curve-fitter. In the render, the main 3D surface is the network’s current guess uᵩ(x,t), drawn as a living sheet over the (x,t) plane. Hovering above is the neural scaffold...a visible graph of feature nodes and connections. The bright tension threads are the physics residual rᵩ(x,t): each thread tethers a collocation bead on the sheet up to the scaffold, and it thickens and brightens exactly where |rᵩ| is large (color encodes the sign). As training runs, those threads go slack across the domain not because we hid the error, but because the network has actually been pushed toward rᵩ(x,t) ≈ 0. #PINNs #PhysicsInformedNeuralNetworks #ScientificMachineLearning #PDE #DifferentialEquations #Optimization #MachineLearning #AppliedMath #ComputationalPhysicsshow more

Mathelirium

33,321 subscribers

47,308 views • 5 months ago •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

What if Your Neural Network Was Forced to Obey Physics? Physics-Informed Neural Networks (PINNs) are neural networks trained to satisfy a differential equation by building the PDE residual directly into the loss. They emerged from a very practical problem...classical PDE pipelines can be brilliant, but they often demand heavy discretization work (meshes, stencils, stability tuning), and the method you build is usually tied to one geometry and one solver setup. A PINN flips the workflow by representing the solution itself as a smooth function uᵩ(x,t) and enforcing the physics everywhere you choose to sample the domain. People often meet PINNs in the least helpful way...via a flashy solution plot, and almost no explanation of what was enforced to get it. In this series we keep the enforcement visible. We pick a differential equation, represent the unknown solution as a flexible function, measure how well that function satisfies the equation across the domain, and train it to reduce that mismatch everywhere we sample. A normal neural net learns from labels...you give it inputs and target outputs. A PINN learns from a differential equation...you give it inputs (x,t) and it gets punished whenever its output fails the PDE. By punish we mean that the loss increases when the mismatch is large we reward it if the loss decreases as the mismatch gets smaller. The network isn’t replacing physics, it’s becoming a flexible function that is forced to satisfy the same calculus you’d impose on any candidate solution. The math breakdown: We start with a PDE we want to solve on a domain Ω. Write it as uₜ(x,t) + N(u(x,t), uₓ(x,t), uₓₓ(x,t), …) = 0 for (x,t) in Ω A PINN replaces the unknown function u with a neural network output uᵩ(x,t) Now define the physics residual by plugging uᵩ into the PDE rᵩ(x,t) = ∂uᵩ/∂t + N(uᵩ, ∂uᵩ/∂x, ∂²uᵩ/∂x², …) If uᵩ were an exact solution, we would have rᵩ(x,t) = 0 everywhere. We may also have data points (xᵢ,tᵢ,uᵢ) from measurements or a known initial condition. The training objective is just a weighted sum of squared errors L(ᵩ) = L_data(ᵩ) + λ L_phys(ᵩ) + L_bc/ic(ᵩ) with L_data(ᵩ) = meanᵢ |uᵩ(xᵢ,tᵢ) − uᵢ|² L_phys(ᵩ) = meanⱼ |rᵩ(xⱼ,tⱼ)|² where (xⱼ,tⱼ) are the collocation points in Ω L_bc/ic(ᵩ) = penalties enforcing boundary conditions and initial conditions The key technical step is that the derivatives inside rᵩ are computed by automatic differentiation ∂uᵩ/∂t, ∂uᵩ/∂x, ∂²uᵩ/∂x², … So we can differentiate the total loss L(ᵩ) with respect to ᵩ and train with gradient descent. This is the whole idea behind PINNs. Learn a function, but make the PDE part of the loss, so the network is trained to be a solution, not just a curve-fitter. In the render, the main 3D surface is the network’s current guess uᵩ(x,t), drawn as a living sheet over the (x,t) plane. Hovering above is the neural scaffold...a visible graph of feature nodes and connections. The bright tension threads are the physics residual rᵩ(x,t): each thread tethers a collocation bead on the sheet up to the scaffold, and it thickens and brightens exactly where |rᵩ| is large (color encodes the sign). As training runs, those threads go slack across the domain not because we hid the error, but because the network has actually been pushed toward rᵩ(x,t) ≈ 0. #PINNs #PhysicsInformedNeuralNetworks #ScientificMachineLearning #PDE #DifferentialEquations #Optimization #MachineLearning #AppliedMath #ComputationalPhysics

What if Your Neural Network Was Forced to Obey Physics? Physics-Informed Neural Networks (PINNs) are neural networks trained to satisfy a differential equation by building the PDE residual directly into the loss. They emerged from a very practical problem...classical PDE pipelines can be brilliant, but they often demand heavy discretization work (meshes, stencils, stability tuning), and the method you build is usually tied to one geometry and one solver setup. A PINN flips the workflow by representing the solution itself as a smooth function uᵩ(x,t) and enforcing the physics everywhere you choose to sample the domain. People often meet PINNs in the least helpful way...via a flashy solution plot, and almost no explanation of what was enforced to get it. In this series we keep the enforcement visible. We pick a differential equation, represent the unknown solution as a flexible function, measure how well that function satisfies the equation across the domain, and train it to reduce that mismatch everywhere we sample. A normal neural net learns from labels...you give it inputs and target outputs. A PINN learns from a differential equation...you give it inputs (x,t) and it gets punished whenever its output fails the PDE. By punish we mean that the loss increases when the mismatch is large we reward it if the loss decreases as the mismatch gets smaller. The network isn’t replacing physics, it’s becoming a flexible function that is forced to satisfy the same calculus you’d impose on any candidate solution. The math breakdown: We start with a PDE we want to solve on a domain Ω. Write it as uₜ(x,t) + N(u(x,t), uₓ(x,t), uₓₓ(x,t), …) = 0 for (x,t) in Ω A PINN replaces the unknown function u with a neural network output uᵩ(x,t) Now define the physics residual by plugging uᵩ into the PDE rᵩ(x,t) = ∂uᵩ/∂t + N(uᵩ, ∂uᵩ/∂x, ∂²uᵩ/∂x², …) If uᵩ were an exact solution, we would have rᵩ(x,t) = 0 everywhere. We may also have data points (xᵢ,tᵢ,uᵢ) from measurements or a known initial condition. The training objective is just a weighted sum of squared errors L(ᵩ) = L_data(ᵩ) + λ L_phys(ᵩ) + L_bc/ic(ᵩ) with L_data(ᵩ) = meanᵢ |uᵩ(xᵢ,tᵢ) − uᵢ|² L_phys(ᵩ) = meanⱼ |rᵩ(xⱼ,tⱼ)|² where (xⱼ,tⱼ) are the collocation points in Ω L_bc/ic(ᵩ) = penalties enforcing boundary conditions and initial conditions The key technical step is that the derivatives inside rᵩ are computed by automatic differentiation ∂uᵩ/∂t, ∂uᵩ/∂x, ∂²uᵩ/∂x², … So we can differentiate the total loss L(ᵩ) with respect to ᵩ and train with gradient descent. This is the whole idea behind PINNs. Learn a function, but make the PDE part of the loss, so the network is trained to be a solution, not just a curve-fitter. In the render, the main 3D surface is the network’s current guess uᵩ(x,t), drawn as a living sheet over the (x,t) plane. Hovering above is the neural scaffold...a visible graph of feature nodes and connections. The bright tension threads are the physics residual rᵩ(x,t): each thread tethers a collocation bead on the sheet up to the scaffold, and it thickens and brightens exactly where |rᵩ| is large (color encodes the sign). As training runs, those threads go slack across the domain not because we hid the error, but because the network has actually been pushed toward rᵩ(x,t) ≈ 0. #PINNs #PhysicsInformedNeuralNetworks #ScientificMachineLearning #PDE #DifferentialEquations #Optimization #MachineLearning #AppliedMath #ComputationalPhysics

Mathelirium

17,285 views • 1 month ago

Physics-Informed Neural Operators: Learning The Solver, Not Just One Solution Our PINN scene learned one solution field for one PDE setup. A Physics-Informed Neural Operator learns the map from input fields, like material coefficients or source terms, to the full solution across a whole family of PDE problems. So, the goal is no longer just one approximate answer, but a reusable solver-shaped object guided by the physics itself.

Physics-Informed Neural Operators: Learning The Solver, Not Just One Solution Our PINN scene learned one solution field for one PDE setup. A Physics-Informed Neural Operator learns the map from input fields, like material coefficients or source terms, to the full solution across a whole family of PDE problems. So, the goal is no longer just one approximate answer, but a reusable solver-shaped object guided by the physics itself.

Mathelirium

25,935 views • 3 months ago

Why Does Quantum Mechanics Use a Complex Wavefunction? Schrödinger’s equation doesn’t start from mystery. It starts from a very specific bet. The state of a particle is a complex field ψ(x,t), and whatever time-evolution rule we choose has to move ψ forward while preserving total probability. So the basic question is simple. What equation should ψ satisfy so that |ψ|² behaves like a conserved density, the way mass density does in fluid flow? What is ψ? Think of ψ(x,t) as an amplitude attached to the statement the particle is at position x at time t. It’s not a probability. It’s the thing you add first, and only at the end do you square it: p(x,t) = |ψ(x,t)|² Because ψ is complex, it has magnitude and phase. Write it as ψ(x,t) = r(x,t) exp(i θ(x,t)) Then r² = |ψ|² is the density, and the phase θ ends up controlling the flow through the probability current. Where does Schrödinger’s equation come from? Start with two empirical inputs that tie waves to particles: E = ħ ω p = ħ k Here ħ is Planck’s constant divided by 2π. It’s the conversion factor between frequency and energy, and between wavenumber and momentum. A plane wave with angular frequency ω and wavevector k is ψ(x,t) = A exp(i(k·x − ωt)) Now watch what derivatives do to this wave: ∂ψ/∂t = −i ω ψ ∇ψ = i k ψ ∇²ψ = −|k|² ψ Multiply by ħ and you get: i ħ ∂ψ/∂t = ħ ω ψ = E ψ −i ħ ∇ψ = ħ k ψ = p ψ −ħ² ∇²ψ = ħ² |k|² ψ = p² ψ So for plane waves, the operators Ê = i ħ ∂/∂t p̂ = −i ħ ∇ act like energy and momentum. Now bring in the classical, nonrelativistic energy bookkeeping: E = p²/(2m) + V(x) Kinetic plus potential. That’s it. Turn it into an equation for ψ by replacing E and p with the operators above: Ê ψ = (p̂²/(2m) + V) ψ Since p̂² = (−i ħ ∇)·(−i ħ ∇) = −ħ² ∇², this becomes i ħ ∂ψ/∂t = ( −ħ²/(2m) ∇² + V(x) ) ψ That’s the time-dependent Schrödinger equation. This derivation is a controlled heuristic. Match the plane-wave identities to the measured relations E = ħω and p = ħk, then impose the same energy bookkeeping you trust in classical mechanics. Why this is the right kind of rule If ψ is the state, we need a rule that preserves total probability: ∫ |ψ(x,t)|² dx = 1 Schrödinger evolution does, and you can see it by deriving a continuity equation. Let ρ(x,t) = |ψ|² = ψ*ψ. Differentiate: ∂ρ/∂t = ψ* ∂ψ/∂t + ψ ∂ψ*/∂t Use Schrödinger and its complex conjugate. The potential terms cancel, and what’s left can be rearranged into ∂ρ/∂t + ∇·j = 0 with probability current j = (ħ/(2mi)) ( ψ* ∇ψ − ψ ∇ψ* ) That’s the cleanest way to say what ψ is. |ψ|² behaves like a conserved density, the phase drives a current, and the time evolution is fixed, up to V, by combining wave relations with energy bookkeeping: i ħ ∂ψ/∂t = ( −ħ²/(2m) ∇² + V ) ψ #QuantumMechanics #SchrodingerEquation #WaveFunction #BornRule #Physics #MathematicalPhysics

Why Does Quantum Mechanics Use a Complex Wavefunction? Schrödinger’s equation doesn’t start from mystery. It starts from a very specific bet. The state of a particle is a complex field ψ(x,t), and whatever time-evolution rule we choose has to move ψ forward while preserving total probability. So the basic question is simple. What equation should ψ satisfy so that |ψ|² behaves like a conserved density, the way mass density does in fluid flow? What is ψ? Think of ψ(x,t) as an amplitude attached to the statement the particle is at position x at time t. It’s not a probability. It’s the thing you add first, and only at the end do you square it: p(x,t) = |ψ(x,t)|² Because ψ is complex, it has magnitude and phase. Write it as ψ(x,t) = r(x,t) exp(i θ(x,t)) Then r² = |ψ|² is the density, and the phase θ ends up controlling the flow through the probability current. Where does Schrödinger’s equation come from? Start with two empirical inputs that tie waves to particles: E = ħ ω p = ħ k Here ħ is Planck’s constant divided by 2π. It’s the conversion factor between frequency and energy, and between wavenumber and momentum. A plane wave with angular frequency ω and wavevector k is ψ(x,t) = A exp(i(k·x − ωt)) Now watch what derivatives do to this wave: ∂ψ/∂t = −i ω ψ ∇ψ = i k ψ ∇²ψ = −|k|² ψ Multiply by ħ and you get: i ħ ∂ψ/∂t = ħ ω ψ = E ψ −i ħ ∇ψ = ħ k ψ = p ψ −ħ² ∇²ψ = ħ² |k|² ψ = p² ψ So for plane waves, the operators Ê = i ħ ∂/∂t p̂ = −i ħ ∇ act like energy and momentum. Now bring in the classical, nonrelativistic energy bookkeeping: E = p²/(2m) + V(x) Kinetic plus potential. That’s it. Turn it into an equation for ψ by replacing E and p with the operators above: Ê ψ = (p̂²/(2m) + V) ψ Since p̂² = (−i ħ ∇)·(−i ħ ∇) = −ħ² ∇², this becomes i ħ ∂ψ/∂t = ( −ħ²/(2m) ∇² + V(x) ) ψ That’s the time-dependent Schrödinger equation. This derivation is a controlled heuristic. Match the plane-wave identities to the measured relations E = ħω and p = ħk, then impose the same energy bookkeeping you trust in classical mechanics. Why this is the right kind of rule If ψ is the state, we need a rule that preserves total probability: ∫ |ψ(x,t)|² dx = 1 Schrödinger evolution does, and you can see it by deriving a continuity equation. Let ρ(x,t) = |ψ|² = ψψ. Differentiate: ∂ρ/∂t = ψ ∂ψ/∂t + ψ ∂ψ/∂t Use Schrödinger and its complex conjugate. The potential terms cancel, and what’s left can be rearranged into ∂ρ/∂t + ∇·j = 0 with probability current j = (ħ/(2mi)) ( ψ ∇ψ − ψ ∇ψ* ) That’s the cleanest way to say what ψ is. |ψ|² behaves like a conserved density, the phase drives a current, and the time evolution is fixed, up to V, by combining wave relations with energy bookkeeping: i ħ ∂ψ/∂t = ( −ħ²/(2m) ∇² + V ) ψ #QuantumMechanics #SchrodingerEquation #WaveFunction #BornRule #Physics #MathematicalPhysics

Mathelirium

20,781 views • 4 months ago

Phase Space Is the Real Stage of Dynamics. So, Stop Watching the Object and Watch the State. Ordinary Space tells you where something is. Phase Space tells you what state it is in. Position alone is not enough because you also need the variable that tells you where the system is trying to go. In the animation, the left panel shows the particle moving in Ordinary Space, while the right panel shows the same system as a moving point in Phase Space. For one degree of freedom, the state is (x, p) and as time evolves, that state traces a trajectory t ↦ (x(t), p(t)) Instead of thinking of a differential equation as just a formula, Phase Space lets you see it as a flow on the space of states. For this lecture we use ẋ = p ṗ = x − x³ with Hamiltonian H(x,p) = 0.5 p² + 0.25 x⁴ − 0.5 x² #PhaseSpace #DynamicalSystems #NonlinearDynamics #HamiltonianMechanics #Mathematics

Phase Space Is the Real Stage of Dynamics. So, Stop Watching the Object and Watch the State. Ordinary Space tells you where something is. Phase Space tells you what state it is in. Position alone is not enough because you also need the variable that tells you where the system is trying to go. In the animation, the left panel shows the particle moving in Ordinary Space, while the right panel shows the same system as a moving point in Phase Space. For one degree of freedom, the state is (x, p) and as time evolves, that state traces a trajectory t ↦ (x(t), p(t)) Instead of thinking of a differential equation as just a formula, Phase Space lets you see it as a flow on the space of states. For this lecture we use ẋ = p ṗ = x − x³ with Hamiltonian H(x,p) = 0.5 p² + 0.25 x⁴ − 0.5 x² #PhaseSpace #DynamicalSystems #NonlinearDynamics #HamiltonianMechanics #Mathematics

Mathelirium

30,340 views • 3 months ago

Lecture 2 on our Quantum Mechanics Series Schrödinger’s equation doesn’t start from mystery. It starts from a very specific bet…the state of a particle is a complex field ψ(x,t), and whatever dynamics we write down must move ψ forward in time in a way that preserves total probability. We ask a basic question…what equation should ψ satisfy so that |ψ|² behaves like a conserved density, the way mass density does in fluid flow? What is ψ? Think of ψ(x,t) as the amplitude assigned to “the particle is at position x at time t”. It’s not a probability. It’s the object you add first, and only at the end do you square p(x,t) = |ψ(x,t)|² Because ψ is complex, it has magnitude and phase. Write it in polar form ψ(x,t) = r(x,t) exp(i θ(x,t)) Then r² = |ψ|² is the density, and θ will end up controlling flow (the probability current). Where does Schrödinger’s equation come from? Start with two empirical inputs about waves and particles: E = ħ ω p = ħ k Here ħ (“h-bar”) is Planck’s constant divided by 2π. It’s the unit conversion factor between the wave description (frequency ω, wavevector k) and the particle description (energy E, momentum p). In units, ħ has units of joule-seconds, so multiplying ω (1/seconds) gives energy (joules), and multiplying k (1/meters) gives momentum (kg·m/s). It’s the number that tells you how much energy or momentum you get per unit frequency or wavenumber. A plane wave with angular frequency ω and wavevector k is ψ(x,t) = A exp(i(k·x − ω t)) Now notice what derivatives do to this wave: ∂ψ/∂t = −i ω ψ ∇ψ = i k ψ ∇²ψ = −|k|² ψ Multiply those identities by ħ: i ħ ∂ψ/∂t = ħ ω ψ = E ψ −i ħ ∇ψ = ħ k ψ = p ψ −ħ² ∇²ψ = ħ² |k|² ψ = p² ψ So for plane waves, the operators Ê = i ħ ∂/∂t p̂ = −i ħ ∇ act like energy and momentum! Now use the classical nonrelativistic energy relation: E = p²/(2m) + V(x) This is bookkeeping for a particle moving slow enough that relativity can be ignored. The term p²/(2m) is kinetic energy. If p = mv, then p²/(2m) = (m²v²)/(2m) = (1/2)mv². The term V(x) is potential energy. It depends on position because forces come from spatially varying energy. A slope in V pushes the particle. Examples: for a charged particle in an electric potential φ(x), V(x) = q φ(x). Near Earth, V(z) = mgz. The point is total energy equals kinetic plus potential. Turn that into an equation for ψ by replacing E and p with the operators above: Ê ψ = (p̂²/(2m) + V) ψ Compute p̂² = (−i ħ ∇)·(−i ħ ∇) = −ħ² ∇², so we get i ħ ∂ψ/∂t = ( −ħ²/(2m) ∇² + V(x) ) ψ That is the time-dependent Schrödinger equation. The derivation here is a controlled heuristic: we matched the plane-wave identities to the measured relations E = ħω and p = ħk, then imposed the same energy bookkeeping as classical mechanics. Why this equation is the right kind of rule If ψ is the state, we need a rule that preserves total probability: ∫ |ψ(x,t)|² dx = 1 Schrödinger evolution does. You can see it by deriving a continuity equation. Let ρ(x,t) = |ψ|² = ψ* ψ. Take a time derivative: ∂ρ/∂t = ψ* ∂ψ/∂t + ψ ∂ψ*/∂t Use Schrödinger and its complex conjugate: ∂ψ/∂t = (1/(i ħ)) ( −ħ²/(2m) ∇²ψ + Vψ ) ∂ψ*/∂t = (−1/(i ħ)) ( −ħ²/(2m) ∇²ψ* + Vψ* ) Plug in. The V terms cancel exactly, and what remains can be rearranged into a divergence: ∂ρ/∂t + ∇·j = 0 where the probability current is j = (ħ/(2mi)) ( ψ* ∇ψ − ψ ∇ψ* ) This is the best way to explain ehat ψ is: |ψ|² behaves like a conserved density, and the phase of ψ is what drives the current j. So in this series, ψ isn’t a slogan. It’s the object whose modulus squared is the density, whose phase generates flow, and whose time evolution is fixed (up to V) by matching wave relations to energy bookkeeping: i ħ ∂ψ/∂t = ( -ħ²/(2m) ∇² + V ) ψ #QuantumMechanics #SchrodingerEquation #WaveFunction #BornRule #Physics #MathematicalPhysics

Lecture 2 on our Quantum Mechanics Series Schrödinger’s equation doesn’t start from mystery. It starts from a very specific bet…the state of a particle is a complex field ψ(x,t), and whatever dynamics we write down must move ψ forward in time in a way that preserves total probability. We ask a basic question…what equation should ψ satisfy so that |ψ|² behaves like a conserved density, the way mass density does in fluid flow? What is ψ? Think of ψ(x,t) as the amplitude assigned to “the particle is at position x at time t”. It’s not a probability. It’s the object you add first, and only at the end do you square p(x,t) = |ψ(x,t)|² Because ψ is complex, it has magnitude and phase. Write it in polar form ψ(x,t) = r(x,t) exp(i θ(x,t)) Then r² = |ψ|² is the density, and θ will end up controlling flow (the probability current). Where does Schrödinger’s equation come from? Start with two empirical inputs about waves and particles: E = ħ ω p = ħ k Here ħ (“h-bar”) is Planck’s constant divided by 2π. It’s the unit conversion factor between the wave description (frequency ω, wavevector k) and the particle description (energy E, momentum p). In units, ħ has units of joule-seconds, so multiplying ω (1/seconds) gives energy (joules), and multiplying k (1/meters) gives momentum (kg·m/s). It’s the number that tells you how much energy or momentum you get per unit frequency or wavenumber. A plane wave with angular frequency ω and wavevector k is ψ(x,t) = A exp(i(k·x − ω t)) Now notice what derivatives do to this wave: ∂ψ/∂t = −i ω ψ ∇ψ = i k ψ ∇²ψ = −|k|² ψ Multiply those identities by ħ: i ħ ∂ψ/∂t = ħ ω ψ = E ψ −i ħ ∇ψ = ħ k ψ = p ψ −ħ² ∇²ψ = ħ² |k|² ψ = p² ψ So for plane waves, the operators Ê = i ħ ∂/∂t p̂ = −i ħ ∇ act like energy and momentum! Now use the classical nonrelativistic energy relation: E = p²/(2m) + V(x) This is bookkeeping for a particle moving slow enough that relativity can be ignored. The term p²/(2m) is kinetic energy. If p = mv, then p²/(2m) = (m²v²)/(2m) = (1/2)mv². The term V(x) is potential energy. It depends on position because forces come from spatially varying energy. A slope in V pushes the particle. Examples: for a charged particle in an electric potential φ(x), V(x) = q φ(x). Near Earth, V(z) = mgz. The point is total energy equals kinetic plus potential. Turn that into an equation for ψ by replacing E and p with the operators above: Ê ψ = (p̂²/(2m) + V) ψ Compute p̂² = (−i ħ ∇)·(−i ħ ∇) = −ħ² ∇², so we get i ħ ∂ψ/∂t = ( −ħ²/(2m) ∇² + V(x) ) ψ That is the time-dependent Schrödinger equation. The derivation here is a controlled heuristic: we matched the plane-wave identities to the measured relations E = ħω and p = ħk, then imposed the same energy bookkeeping as classical mechanics. Why this equation is the right kind of rule If ψ is the state, we need a rule that preserves total probability: ∫ |ψ(x,t)|² dx = 1 Schrödinger evolution does. You can see it by deriving a continuity equation. Let ρ(x,t) = |ψ|² = ψ* ψ. Take a time derivative: ∂ρ/∂t = ψ* ∂ψ/∂t + ψ ∂ψ/∂t Use Schrödinger and its complex conjugate: ∂ψ/∂t = (1/(i ħ)) ( −ħ²/(2m) ∇²ψ + Vψ ) ∂ψ/∂t = (−1/(i ħ)) ( −ħ²/(2m) ∇²ψ* + Vψ* ) Plug in. The V terms cancel exactly, and what remains can be rearranged into a divergence: ∂ρ/∂t + ∇·j = 0 where the probability current is j = (ħ/(2mi)) ( ψ* ∇ψ − ψ ∇ψ* ) This is the best way to explain ehat ψ is: |ψ|² behaves like a conserved density, and the phase of ψ is what drives the current j. So in this series, ψ isn’t a slogan. It’s the object whose modulus squared is the density, whose phase generates flow, and whose time evolution is fixed (up to V) by matching wave relations to energy bookkeeping: i ħ ∂ψ/∂t = ( -ħ²/(2m) ∇² + V ) ψ #QuantumMechanics #SchrodingerEquation #WaveFunction #BornRule #Physics #MathematicalPhysics

Mathelirium

40,835 views • 6 months ago

In Calculus of Variations, we give up the idea of only optimizing values and start optimizing geometries instead. The unknown is not a single x. It is a whole curve/surface y(x). The object you are minimizing is usually not a simple formula that you can guess. It is an integral that judges the whole shape. In our first lecture, we discuss the famous Brachistochrone problem. You choose two points, turn on gravity and ask a question which almost sounds too simple...which track will allow a bead to move from A to B in the minimum time? At your first attempt, your instinct will lead you astray. It is not the straight line. It is not the drop hard then cruise sketch either. The winner turns out to be a cycloid...the curve a point on a rolling circle traces. The track is the moving character in the animation. We begin with a flawed curve, perform gradient descent in curve-space, and observe the geometry change frame by frame as T[y] collapses to the brachistochrone. Please see the comment below for the math breakdown. #CalculusOfVariations #Brachistochrone #Cycloid #Physics #Optimization #Mathematics

In Calculus of Variations, we give up the idea of only optimizing values and start optimizing geometries instead. The unknown is not a single x. It is a whole curve/surface y(x). The object you are minimizing is usually not a simple formula that you can guess. It is an integral that judges the whole shape. In our first lecture, we discuss the famous Brachistochrone problem. You choose two points, turn on gravity and ask a question which almost sounds too simple...which track will allow a bead to move from A to B in the minimum time? At your first attempt, your instinct will lead you astray. It is not the straight line. It is not the drop hard then cruise sketch either. The winner turns out to be a cycloid...the curve a point on a rolling circle traces. The track is the moving character in the animation. We begin with a flawed curve, perform gradient descent in curve-space, and observe the geometry change frame by frame as T[y] collapses to the brachistochrone. Please see the comment below for the math breakdown. #CalculusOfVariations #Brachistochrone #Cycloid #Physics #Optimization #Mathematics

Mathelirium

23,254 views • 1 month ago

Ask anyone who’s taken a course in Ordinary Differential Equations (ODEs) what a solution to an ODE represents geometrically, and most of them won’t have a clean answer. When I first took ordinary differential equations, the pattern was always the same. Early on it turns into a speedrun of methods: separation of variables, integrating factors, variation of parameters, Bernoulli, exact equations. Then pretty quickly the course slides into hammer-picking. Spot the form, apply the recipe, move on. Too mechanical! And the real problem is what you don’t walk away with. You leave with a toolkit, but without a feel for what a differential equation even is, especially geometrically. That matters because in real modeling the equations you meet are rarely nice enough to reward memorised recipes. So you get trained to solve toy forms, while the actual subject stays blurry. The behavior. The flow. The shape of solutions. It wasn't until I watched the first lecture of Professor Arthur Mattuck that I realized I didn’t actually know what a solution to a differential equation represents geometrically. His point is almost embarrassingly simple. A first-order ODE is a slope field, and a solution is a curve that stays tangent to that field everywhere. The math breakdown: Write the ODE as dy/dx = f(x,y). At each point (x,y), attach a tiny line segment with slope f(x,y). A function y = y₁(x) is a solution exactly when its graph follows those slopes. At every x, the slope of the curve equals the slope prescribed by the field at the point on the curve. That’s the one line that ties both viewpoints together: y₁′(x) = f(x, y₁(x)). So solving the ODE and drawing an integral curve are the same statement in two languages. Once you see that, you stop obsessing over whether you can write y(x) in closed form. You start asking the questions that actually matter. Where do solutions flow. Where do they get trapped. Where do they blow up. Where does existence or uniqueness fail because the field isn’t even defined? That’s the perspective shift I wish every ODE course forces early. It’s also why I keep pairing math with animation. #DifferentialEquations #ODEs #VectorFields #AppliedMathematics #Mathematics #

Ask anyone who’s taken a course in Ordinary Differential Equations (ODEs) what a solution to an ODE represents geometrically, and most of them won’t have a clean answer. When I first took ordinary differential equations, the pattern was always the same. Early on it turns into a speedrun of methods: separation of variables, integrating factors, variation of parameters, Bernoulli, exact equations. Then pretty quickly the course slides into hammer-picking. Spot the form, apply the recipe, move on. Too mechanical! And the real problem is what you don’t walk away with. You leave with a toolkit, but without a feel for what a differential equation even is, especially geometrically. That matters because in real modeling the equations you meet are rarely nice enough to reward memorised recipes. So you get trained to solve toy forms, while the actual subject stays blurry. The behavior. The flow. The shape of solutions. It wasn't until I watched the first lecture of Professor Arthur Mattuck that I realized I didn’t actually know what a solution to a differential equation represents geometrically. His point is almost embarrassingly simple. A first-order ODE is a slope field, and a solution is a curve that stays tangent to that field everywhere. The math breakdown: Write the ODE as dy/dx = f(x,y). At each point (x,y), attach a tiny line segment with slope f(x,y). A function y = y₁(x) is a solution exactly when its graph follows those slopes. At every x, the slope of the curve equals the slope prescribed by the field at the point on the curve. That’s the one line that ties both viewpoints together: y₁′(x) = f(x, y₁(x)). So solving the ODE and drawing an integral curve are the same statement in two languages. Once you see that, you stop obsessing over whether you can write y(x) in closed form. You start asking the questions that actually matter. Where do solutions flow. Where do they get trapped. Where do they blow up. Where does existence or uniqueness fail because the field isn’t even defined? That’s the perspective shift I wish every ODE course forces early. It’s also why I keep pairing math with animation. #DifferentialEquations #ODEs #VectorFields #AppliedMathematics #Mathematics #

Mathelirium

40,739 views • 4 months ago

Finding the roots of a polynomial is one of the oldest and most central problems in Algebra. Once the coefficients are allowed to move through the complex plane, the roots start revealing a much richer geometry. Here is the Mathematics, try it yourself and play around with the coefficients and polynomial structure: P(x) = x¹² + a x⁸ + b x⁶ + c x⁴ + d x + 1 a = (sin(t₁) + i cos(sin(t₂) + 0.5 sin(2t₂))) M b = 0.3 (t₁ + t₂) Mᶜ c = (exp(i t₁) - cos(t₂)) M d = (cos(t₁) - 0.5 i sin(sin(t₂) + 0.5 sin(2t₂))) Mᶜ M = 1 + ∑ₖ₌₁ᵐ Aₖ exp(i 2π sₖ n / N) Mᶜ = 1 + ∑ₖ₌₁ᵐ Aₖ exp(-i 2π sₖ n / N) Here t₁ and t₂ lie on the unit circle, n is the frame index, N is the total number of frames, and Aₖ, sₖ are the modulation amplitudes and speeds.

Finding the roots of a polynomial is one of the oldest and most central problems in Algebra. Once the coefficients are allowed to move through the complex plane, the roots start revealing a much richer geometry. Here is the Mathematics, try it yourself and play around with the coefficients and polynomial structure: P(x) = x¹² + a x⁸ + b x⁶ + c x⁴ + d x + 1 a = (sin(t₁) + i cos(sin(t₂) + 0.5 sin(2t₂))) M b = 0.3 (t₁ + t₂) Mᶜ c = (exp(i t₁) - cos(t₂)) M d = (cos(t₁) - 0.5 i sin(sin(t₂) + 0.5 sin(2t₂))) Mᶜ M = 1 + ∑ₖ₌₁ᵐ Aₖ exp(i 2π sₖ n / N) Mᶜ = 1 + ∑ₖ₌₁ᵐ Aₖ exp(-i 2π sₖ n / N) Here t₁ and t₂ lie on the unit circle, n is the frame index, N is the total number of frames, and Aₖ, sₖ are the modulation amplitudes and speeds.

Mathelirium

22,783 views • 2 months ago

The Trap in Every Mathematics Lecture If you’ve taken a lot of math courses, you start to recognize a pattern. There’s a moment where the lecturer is warming up with the obvious stuff...add matrices entrywise, scale by α, do the row-column product...and you’re thinking, alright… where is this going? Then you relax. You stop resisting. And right there, they slip in one line that changes how you see the whole subject. When Benedict Gross says "matrices represent linear operators,"he’s telling you to stop treating a matrix as a rectangle of numbers and start treating it as an action. A linear operator is a function T: Rⁿ → Rⁿ that respects two rules: T(u+v)=T(u)+T(v) and T(αu)=αT(u). Once you pick a basis, T is completely determined by where it sends the basis vectors e₁,…,eₙ. Put T(e₁),…,T(eₙ) into columns and you get a matrix A. That is what "A represents T" means...A is the coordinate portrait of the transformation. Now the punchline that makes matrix multiplication feel inevitable. If B represents S and A represents T, then doing S first and then T is the composition T∘S. In coordinates that becomes A(Bx)=(AB)x. So multiplying matrices is really composing transformations. That’s why multiplication is usually not commutative: T∘S is generally not the same transformation as S∘T, and the matrices inherit that noncommutativity. This explains half of Linear Algebra because it tells you what the course is really about...functions that move vectors around, not grids of numbers. A matrix is just the written form of that function once you choose coordinates. Then the rules stop feeling random Multiplying matrices means doing one move and then another, an inverse means you can undo the move, eigenvectors are directions that don’t get turned, and changing basis is just describing the same move in a different language. That one idea makes a lot of linear algebra click. #LinearAlgebra #Matrices #GroupTheory #GLn #MathLectures #Mathematics

The Trap in Every Mathematics Lecture If you’ve taken a lot of math courses, you start to recognize a pattern. There’s a moment where the lecturer is warming up with the obvious stuff...add matrices entrywise, scale by α, do the row-column product...and you’re thinking, alright… where is this going? Then you relax. You stop resisting. And right there, they slip in one line that changes how you see the whole subject. When Benedict Gross says "matrices represent linear operators,"he’s telling you to stop treating a matrix as a rectangle of numbers and start treating it as an action. A linear operator is a function T: Rⁿ → Rⁿ that respects two rules: T(u+v)=T(u)+T(v) and T(αu)=αT(u). Once you pick a basis, T is completely determined by where it sends the basis vectors e₁,…,eₙ. Put T(e₁),…,T(eₙ) into columns and you get a matrix A. That is what "A represents T" means...A is the coordinate portrait of the transformation. Now the punchline that makes matrix multiplication feel inevitable. If B represents S and A represents T, then doing S first and then T is the composition T∘S. In coordinates that becomes A(Bx)=(AB)x. So multiplying matrices is really composing transformations. That’s why multiplication is usually not commutative: T∘S is generally not the same transformation as S∘T, and the matrices inherit that noncommutativity. This explains half of Linear Algebra because it tells you what the course is really about...functions that move vectors around, not grids of numbers. A matrix is just the written form of that function once you choose coordinates. Then the rules stop feeling random Multiplying matrices means doing one move and then another, an inverse means you can undo the move, eigenvectors are directions that don’t get turned, and changing basis is just describing the same move in a different language. That one idea makes a lot of linear algebra click. #LinearAlgebra #Matrices #GroupTheory #GLn #MathLectures #Mathematics

Mathelirium

66,204 views • 5 months ago

Calculus of Variations is what happens when you stop optimizing values and start optimizing geometries. The unknown isn’t a single x...it’s a whole curve y(x). And the thing you’re minimizing usually isn’t a formula you can eyeball...it’s an integral that judges the entire shape. For our first lecture, we look at the famous Brachistochrone problem. Fix two points, switch on gravity, and ask a question that sounds too simple...which track gets a bead from A to B in the least time? Your intuition will betray you on the first try. It’s not the straight line. It’s not the drop hard then cruise sketch either. The winner is a cycloid...the curve traced by a point on a rolling circle. In the animation, the track is the moving character. We start with an imperfect curve, run gradient descent in curve-space, and watch the geometry reshape frame by frame as T[y] collapses until it locks into the brachistochrone. Pls see the comment below for the math breakdown. #CalculusOfVariations #Brachistochrone #Cycloid #Physics #Optimization #Mathematics

Calculus of Variations is what happens when you stop optimizing values and start optimizing geometries. The unknown isn’t a single x...it’s a whole curve y(x). And the thing you’re minimizing usually isn’t a formula you can eyeball...it’s an integral that judges the entire shape. For our first lecture, we look at the famous Brachistochrone problem. Fix two points, switch on gravity, and ask a question that sounds too simple...which track gets a bead from A to B in the least time? Your intuition will betray you on the first try. It’s not the straight line. It’s not the drop hard then cruise sketch either. The winner is a cycloid...the curve traced by a point on a rolling circle. In the animation, the track is the moving character. We start with an imperfect curve, run gradient descent in curve-space, and watch the geometry reshape frame by frame as T[y] collapses until it locks into the brachistochrone. Pls see the comment below for the math breakdown. #CalculusOfVariations #Brachistochrone #Cycloid #Physics #Optimization #Mathematics

Mathelirium

106,981 views • 6 months ago

Ever Heard of a Metric in Differential Geometry? How About a Geodesic? These two ideas are central to how Mathematics and Physics describe Spacetime. A metric is how a surface measures distance and angle. A geodesic is the path that counts as "straight" after that measuring rule is chosen. Once the metric changes from point to point, straightness is no longer something you judge by eye. On a curved surface, the shortest paths are geodesics. They are chosen by the metric. Take a surface written as r(x,y) = (x, y, f(x,y)) Its metric is gᵢⱼ = ∂ᵢr · ∂ⱼr A geodesic q(t) = (x(t), y(t)) obeys q̈ᵏ + Γᵏᵢⱼ q̇ᶦq̇ʲ = 0 where Γᵏᵢⱼ are the Christoffel symbols of the surface metric. Launch many almost-parallel paths. At the start, they look like a clean beam until curvature takes over. Some paths focus, others separate and some slide around ridges. The surface is essentially the rulebook for motion. The shortest path is local. The geometry remembers everything. #DifferentialGeometry #Geodesics #RiemannianGeometry #MathematicalPhysics #Curvature #TensorCalculus #AppliedMathematics #Geometry #ScienceAnimation

Ever Heard of a Metric in Differential Geometry? How About a Geodesic? These two ideas are central to how Mathematics and Physics describe Spacetime. A metric is how a surface measures distance and angle. A geodesic is the path that counts as "straight" after that measuring rule is chosen. Once the metric changes from point to point, straightness is no longer something you judge by eye. On a curved surface, the shortest paths are geodesics. They are chosen by the metric. Take a surface written as r(x,y) = (x, y, f(x,y)) Its metric is gᵢⱼ = ∂ᵢr · ∂ⱼr A geodesic q(t) = (x(t), y(t)) obeys q̈ᵏ + Γᵏᵢⱼ q̇ᶦq̇ʲ = 0 where Γᵏᵢⱼ are the Christoffel symbols of the surface metric. Launch many almost-parallel paths. At the start, they look like a clean beam until curvature takes over. Some paths focus, others separate and some slide around ridges. The surface is essentially the rulebook for motion. The shortest path is local. The geometry remembers everything. #DifferentialGeometry #Geodesics #RiemannianGeometry #MathematicalPhysics #Curvature #TensorCalculus #AppliedMathematics #Geometry #ScienceAnimation

Mathelirium

22,170 views • 8 days ago

Lecture 3 of our Quantum Mechanics series. Lecture 2 gave us the one clean privilege quantum theory offers: treat ψ(x,t) as the state and ρ(x,t) = |ψ(x,t)|² as probability, because Schrödinger evolution forces ρ to obey a continuity equation. Lecture 3 is what that continuity equation is really telling you. If ρ behaves like a fluid, then the only question that matters is: What is the velocity field? Write ψ(x,t) = r(x,t) exp(i θ(x,t)). The magnitude r sets how much probability is sitting there. The phase θ sets where it tries to go. When you unpack the current j = Im(ψ* ∇ψ), it collapses to j = (ρ/m) ∇θ, which means the flow lines you draw are literally contours of phase geometry. Then the constraint that makes the picture bite: ψ has to be single-valued, so θ can’t wind by an arbitrary amount. Around any closed loop the total phase change must be 2π n, with n an integer. That’s why vortices aren’t features you add...they’re defects the math permits, in quantized units. In the render you see both layers at once...the 3D surface shows |ψ| breathing while the phase skin slides, and the 2D panel exposes the engine...current lines steering around discrete vortex charges. The math breakdown We write the state as a complex field ψ(x,t) on the plane (x in R²). The Born rule defines the probability density ρ(x,t) = |ψ(x,t)|² Schrödinger evolution (ħ = 1 units) is i ∂ψ/∂t = [ −(1/2m) ∇² + V(x,t) ] ψ Now derive conservation of probability. Start with ρ = ψ*ψ: ∂ρ/∂t = ψ* (∂ψ/∂t) + ψ (∂ψ*/∂t) Use Schrödinger and its complex conjugate: ∂ψ/∂t = (1/i) [ −(1/2m) ∇²ψ + Vψ ] ∂ψ*/∂t = (−1/i) [ −(1/2m) ∇²ψ* + Vψ* ] Substitute. The V terms cancel, and the remaining terms rearrange into the continuity equation ∂ρ/∂t + ∇·j = 0 with probability current j = (1/2mi) ( ψ* ∇ψ − ψ ∇ψ* ) = (1/m) Im(ψ* ∇ψ) So "probability density" really behaves like a conserved fluid density with flux j. Now expose the phase mechanism. Write ψ in polar form ψ(x,t) = r(x,t) exp(i θ(x,t)) Compute the gradient ∇ψ = exp(iθ) (∇r + i r ∇θ) Then ψ* ∇ψ = r (∇r + i r ∇θ) Taking the imaginary part gives Im(ψ* ∇ψ) = r² ∇θ = ρ ∇θ So the current becomes j = (ρ/m) ∇θ That’s the steering-wheel statement: Phase gradient sets the flow direction and speed (modulated by density and m). Finally, quantized vortices. Because ψ must be single-valued, going around any closed loop must return the same complex value. That forces the phase winding to be an integer multiple of 2π: ∮ ∇θ · dl = 2π n with n in Z n is the vortex charge. Vortex cores sit where ρ ≈ 0 (phase is undefined), and the current streamlines circulate around them. #QuantumMechanics #Wavefunction #SchrodingerEquation #BornRule #ProbabilityCurrent #ContinuityEquation #Phase #Vortices #TopologicalDefects #ComplexAnalysis #MathematicalPhysics #Mathematics #Physics

Lecture 3 of our Quantum Mechanics series. Lecture 2 gave us the one clean privilege quantum theory offers: treat ψ(x,t) as the state and ρ(x,t) = |ψ(x,t)|² as probability, because Schrödinger evolution forces ρ to obey a continuity equation. Lecture 3 is what that continuity equation is really telling you. If ρ behaves like a fluid, then the only question that matters is: What is the velocity field? Write ψ(x,t) = r(x,t) exp(i θ(x,t)). The magnitude r sets how much probability is sitting there. The phase θ sets where it tries to go. When you unpack the current j = Im(ψ* ∇ψ), it collapses to j = (ρ/m) ∇θ, which means the flow lines you draw are literally contours of phase geometry. Then the constraint that makes the picture bite: ψ has to be single-valued, so θ can’t wind by an arbitrary amount. Around any closed loop the total phase change must be 2π n, with n an integer. That’s why vortices aren’t features you add...they’re defects the math permits, in quantized units. In the render you see both layers at once...the 3D surface shows |ψ| breathing while the phase skin slides, and the 2D panel exposes the engine...current lines steering around discrete vortex charges. The math breakdown We write the state as a complex field ψ(x,t) on the plane (x in R²). The Born rule defines the probability density ρ(x,t) = |ψ(x,t)|² Schrödinger evolution (ħ = 1 units) is i ∂ψ/∂t = [ −(1/2m) ∇² + V(x,t) ] ψ Now derive conservation of probability. Start with ρ = ψψ: ∂ρ/∂t = ψ (∂ψ/∂t) + ψ (∂ψ/∂t) Use Schrödinger and its complex conjugate: ∂ψ/∂t = (1/i) [ −(1/2m) ∇²ψ + Vψ ] ∂ψ/∂t = (−1/i) [ −(1/2m) ∇²ψ* + Vψ* ] Substitute. The V terms cancel, and the remaining terms rearrange into the continuity equation ∂ρ/∂t + ∇·j = 0 with probability current j = (1/2mi) ( ψ* ∇ψ − ψ ∇ψ* ) = (1/m) Im(ψ* ∇ψ) So "probability density" really behaves like a conserved fluid density with flux j. Now expose the phase mechanism. Write ψ in polar form ψ(x,t) = r(x,t) exp(i θ(x,t)) Compute the gradient ∇ψ = exp(iθ) (∇r + i r ∇θ) Then ψ* ∇ψ = r (∇r + i r ∇θ) Taking the imaginary part gives Im(ψ* ∇ψ) = r² ∇θ = ρ ∇θ So the current becomes j = (ρ/m) ∇θ That’s the steering-wheel statement: Phase gradient sets the flow direction and speed (modulated by density and m). Finally, quantized vortices. Because ψ must be single-valued, going around any closed loop must return the same complex value. That forces the phase winding to be an integer multiple of 2π: ∮ ∇θ · dl = 2π n with n in Z n is the vortex charge. Vortex cores sit where ρ ≈ 0 (phase is undefined), and the current streamlines circulate around them. #QuantumMechanics #Wavefunction #SchrodingerEquation #BornRule #ProbabilityCurrent #ContinuityEquation #Phase #Vortices #TopologicalDefects #ComplexAnalysis #MathematicalPhysics #Mathematics #Physics

Mathelirium

37,998 views • 6 months ago

Ever wondered what neural networks are and how they work? Systems like ChatGPT use neural networks to work as well as they do. Neural networks are composed of "layers" of neurons, layers with different functions; connections between layers called "weights"; and mathematical functions called "activation functions". If you’re interested in learning about these systems, check the comments. Ultimately, the neural network structure of the model serves to visually demonstrate that it is, in fact, a complex mathematical equation. When companies release the model's weights, they are releasing a key component needed to run the model's complete equation. Without the weights, the equation is incomplete. For the math-minded: the weights of a model are the learned numbers (they are variables during training) that are then used as constants in the mathematical functions that make up the model. Neural networks are ultimately just one big, hyper-complex mathematical function, and when a model is trained, it learns the constants associated with the high-dimensional input.

Ever wondered what neural networks are and how they work? Systems like ChatGPT use neural networks to work as well as they do. Neural networks are composed of "layers" of neurons, layers with different functions; connections between layers called "weights"; and mathematical functions called "activation functions". If you’re interested in learning about these systems, check the comments. Ultimately, the neural network structure of the model serves to visually demonstrate that it is, in fact, a complex mathematical equation. When companies release the model's weights, they are releasing a key component needed to run the model's complete equation. Without the weights, the equation is incomplete. For the math-minded: the weights of a model are the learned numbers (they are variables during training) that are then used as constants in the mathematical functions that make up the model. Neural networks are ultimately just one big, hyper-complex mathematical function, and when a model is trained, it learns the constants associated with the high-dimensional input.

Harper Carroll

29,018 views • 8 months ago

The Trap in Every Mathematics Lecture If you’ve taken enough math courses, you start noticing the same little move. The lecturer warms up with the obvious stuff, add matrices entrywise, scale by α, do the row-column product, and you’re thinking alright, where is this going. Then you relax. You stop resisting. And right there, they drop one line that quietly rewires the whole subject. When Benedict Gross says matrices represent linear operators, he’s telling you to stop treating a matrix as a rectangle of numbers and start treating it as an action. A linear operator is a function T: ℝⁿ → ℝⁿ that respects two rules: T(u+v) = T(u) + T(v) T(αu) = αT(u) Once you pick a basis, T is completely determined by where it sends the basis vectors e₁,…,eₙ. Put T(e₁),…,T(eₙ) into columns and you get a matrix A. That is what A represents T means. A is the coordinate portrait of the transformation. Now the punchline that makes matrix multiplication feel inevitable. If B represents S and A represents T, then doing S first and then T is the composition T∘S. In coordinates that becomes A(Bx) = (AB)x. So multiplying matrices is really composing transformations. That’s why multiplication is usually not commutative. T∘S is generally not the same transformation as S∘T, and the matrices inherit that noncommutativity. This explains half of linear algebra because it tells you what the course is really about: functions that move vectors around, not grids of numbers. A matrix is just the written form of that function once you choose coordinates. After that, the rules stop feeling random. Multiplying matrices means doing one move and then another. An inverse means you can undo the move. Eigenvectors are directions that don’t get turned. Changing basis is just describing the same move in a different language. One idea, and a lot of linear algebra suddenly clicks. #LinearAlgebra #Matrices #LinearMaps #Eigenvectors #ChangeOfBasis #Mathematics

The Trap in Every Mathematics Lecture If you’ve taken enough math courses, you start noticing the same little move. The lecturer warms up with the obvious stuff, add matrices entrywise, scale by α, do the row-column product, and you’re thinking alright, where is this going. Then you relax. You stop resisting. And right there, they drop one line that quietly rewires the whole subject. When Benedict Gross says matrices represent linear operators, he’s telling you to stop treating a matrix as a rectangle of numbers and start treating it as an action. A linear operator is a function T: ℝⁿ → ℝⁿ that respects two rules: T(u+v) = T(u) + T(v) T(αu) = αT(u) Once you pick a basis, T is completely determined by where it sends the basis vectors e₁,…,eₙ. Put T(e₁),…,T(eₙ) into columns and you get a matrix A. That is what A represents T means. A is the coordinate portrait of the transformation. Now the punchline that makes matrix multiplication feel inevitable. If B represents S and A represents T, then doing S first and then T is the composition T∘S. In coordinates that becomes A(Bx) = (AB)x. So multiplying matrices is really composing transformations. That’s why multiplication is usually not commutative. T∘S is generally not the same transformation as S∘T, and the matrices inherit that noncommutativity. This explains half of linear algebra because it tells you what the course is really about: functions that move vectors around, not grids of numbers. A matrix is just the written form of that function once you choose coordinates. After that, the rules stop feeling random. Multiplying matrices means doing one move and then another. An inverse means you can undo the move. Eigenvectors are directions that don’t get turned. Changing basis is just describing the same move in a different language. One idea, and a lot of linear algebra suddenly clicks. #LinearAlgebra #Matrices #LinearMaps #Eigenvectors #ChangeOfBasis #Mathematics

Mathelirium

133,454 views • 4 months ago

We just watched Professor Arthur Mattuck kick off MIT’s ODE course with the one interpretation that most differential equations classes somehow postpone: An ordinary differential equation isn’t primarily a method hunt. It’s a geometric rule. You write dy/dx = f(x,y), and that right-hand side is literally telling you the slope your solution curve must have at each point (x,y). So I built this animation as a visual companion to that first lecture. It draws the direction field (little line elements whose slope is f(x,y)) and then shows integral curves sliding through it...curves that are tangent to the field everywhere they go. Two quick examples from the animation: For dy/dx = −x/y, the slope field steers you onto circles x² + y² = R². You also see a subtle point that gets missed when everything is taught as y(x)...even when the curve exists smoothly, the graph y(x) may only exist on a limited interval (|x|<R for the upper semicircle). For dy/dx = 1 + x − y, the isoclines (curves where the slope is constant) make the global behavior obvious...trajectories get funneled into a corridor and become asymptotic to the special solution y=x. You learn qualitative behavior without solving it the traditional way. #DifferentialEquations #ODEs #MITOCW #VectorFields #MathAnimation #Mathematics

We just watched Professor Arthur Mattuck kick off MIT’s ODE course with the one interpretation that most differential equations classes somehow postpone: An ordinary differential equation isn’t primarily a method hunt. It’s a geometric rule. You write dy/dx = f(x,y), and that right-hand side is literally telling you the slope your solution curve must have at each point (x,y). So I built this animation as a visual companion to that first lecture. It draws the direction field (little line elements whose slope is f(x,y)) and then shows integral curves sliding through it...curves that are tangent to the field everywhere they go. Two quick examples from the animation: For dy/dx = −x/y, the slope field steers you onto circles x² + y² = R². You also see a subtle point that gets missed when everything is taught as y(x)...even when the curve exists smoothly, the graph y(x) may only exist on a limited interval (|x|<R for the upper semicircle). For dy/dx = 1 + x − y, the isoclines (curves where the slope is constant) make the global behavior obvious...trajectories get funneled into a corridor and become asymptotic to the special solution y=x. You learn qualitative behavior without solving it the traditional way. #DifferentialEquations #ODEs #MITOCW #VectorFields #MathAnimation #Mathematics

Mathelirium

22,367 views • 5 months ago

Erwin Schrödinger’s 1926 equation changed the game by turning Quantum Mechanics into wave dynamics. That move gave Physics a new way to think. Instead of forcing a particle onto one sharp path, it lets a complex wavefunction evolve in time, with its shape and phase holding the structure of the phenomenon. What makes this so striking is that Schrödinger’s equation does not start from vague mystery. It starts from a precise and daring idea The state of a particle is a complex field ψ(x,t), and the dynamics must push ψ forward in a way that preserves total probability.

Erwin Schrödinger’s 1926 equation changed the game by turning Quantum Mechanics into wave dynamics. That move gave Physics a new way to think. Instead of forcing a particle onto one sharp path, it lets a complex wavefunction evolve in time, with its shape and phase holding the structure of the phenomenon. What makes this so striking is that Schrödinger’s equation does not start from vague mystery. It starts from a precise and daring idea The state of a particle is a complex field ψ(x,t), and the dynamics must push ψ forward in a way that preserves total probability.

Mathelirium

54,730 views • 3 months ago

In 1996, James Sethian showed something almost unfair...you can find shortest routes through a messy world by letting a wave expand once...no trial paths, no search beams, just one growing front. Here’s how: we solve for an arrival-time field T(x,y) so that T literally means how long the wave needs to reach this point. The rule is ||∇ T|| = 1/F, where the medium is fast (F large) the front sprints, where it’s slow it trudges, and obstacles are speed ≈ 0, so the front wraps around them because that’s the only way forward. Then comes the satisfying part: once T exists, a path doesn’t need to search at all...drop a bead anywhere and let it follow ẋ ∝ -∇ T, it slides downhill on the time landscape and traces a globally fastest route back to the source. This “wave = optimal control” viewpoint is exactly what Tsitsiklis (1995) made precise from the Hamilton-Jacobi side...compute the value/arrival-time function and the optimal trajectories fall out from it. #FastMarching #EikonalEquation

In 1996, James Sethian showed something almost unfair...you can find shortest routes through a messy world by letting a wave expand once...no trial paths, no search beams, just one growing front. Here’s how: we solve for an arrival-time field T(x,y) so that T literally means how long the wave needs to reach this point. The rule is ||∇ T|| = 1/F, where the medium is fast (F large) the front sprints, where it’s slow it trudges, and obstacles are speed ≈ 0, so the front wraps around them because that’s the only way forward. Then comes the satisfying part: once T exists, a path doesn’t need to search at all...drop a bead anywhere and let it follow ẋ ∝ -∇ T, it slides downhill on the time landscape and traces a globally fastest route back to the source. This “wave = optimal control” viewpoint is exactly what Tsitsiklis (1995) made precise from the Hamilton-Jacobi side...compute the value/arrival-time function and the optimal trajectories fall out from it. #FastMarching #EikonalEquation

Mathelirium

766,209 views • 6 months ago

Do you actually know what convex optimization is in the geometric, guarantee-theoretic sense or have you only met it through solvers and loss curves? Convexity is rare comfort in optimization...there are no spurious local minima, no surprise traps, and inequalities you can use like tools instead of prayers. So, what is this convexity? Let x = (x₁, x₂) and let f(x) be convex. Plot the surface z = f(x). Pick a contact point x₀. The local slope is the gradient p = ∇f(x₀). That p is exactly the data that defines the supporting plane: z = f(x₀) + p · (x − x₀). Thus, f is said to be convex because for every x, f(x) ≥ f(x₀) + p · (x − x₀). So the plane at x₀ can slide under the surface, but it never slices through it. Not near the point...everywhere. Now for here is the interesting part: The slope becomes a coordinate system! Rewrite the same plane as z = p · x − b, where b is the offset. Because the plane passes through (x₀, f(x₀)), the offset is forced to be b = p · x₀ − f(x₀). And that number isn’t just geometry trivia. It’s the convex conjugate: f*(p) = sup over x ( p · x − f(x) ). At a differentiable contact point, the supporting plane touches f tightly enough that the supremum is achieved at x₀, giving the identity f*(p) = p · x₀ − f(x₀) when p = ∇f(x₀). So one moving contact point gives two linked readouts: primal position x₀ dual position (slope) p = ∇f(x₀) dual offset f*(p) One surface. Two worlds. #ConvexOptimization #Optimization #MachineLearning #SignalProcessing #AppliedMath #Engineering

Do you actually know what convex optimization is in the geometric, guarantee-theoretic sense or have you only met it through solvers and loss curves? Convexity is rare comfort in optimization...there are no spurious local minima, no surprise traps, and inequalities you can use like tools instead of prayers. So, what is this convexity? Let x = (x₁, x₂) and let f(x) be convex. Plot the surface z = f(x). Pick a contact point x₀. The local slope is the gradient p = ∇f(x₀). That p is exactly the data that defines the supporting plane: z = f(x₀) + p · (x − x₀). Thus, f is said to be convex because for every x, f(x) ≥ f(x₀) + p · (x − x₀). So the plane at x₀ can slide under the surface, but it never slices through it. Not near the point...everywhere. Now for here is the interesting part: The slope becomes a coordinate system! Rewrite the same plane as z = p · x − b, where b is the offset. Because the plane passes through (x₀, f(x₀)), the offset is forced to be b = p · x₀ − f(x₀). And that number isn’t just geometry trivia. It’s the convex conjugate: f(p) = sup over x ( p · x − f(x) ). At a differentiable contact point, the supporting plane touches f tightly enough that the supremum is achieved at x₀, giving the identity f(p) = p · x₀ − f(x₀) when p = ∇f(x₀). So one moving contact point gives two linked readouts: primal position x₀ dual position (slope) p = ∇f(x₀) dual offset f*(p) One surface. Two worlds. #ConvexOptimization #Optimization #MachineLearning #SignalProcessing #AppliedMath #Engineering

Mathelirium

38,506 views • 6 months ago

When I first took ordinary differential equations, the pattern was always the same. Week 1 turns into a speedrun of methods: separation of variables, integrating factors, variation of parameters, Bernoulli, exact equations… and by Week 2 or 3 the course has quietly degenerated into hammer-picking. Spot the form, apply the recipe, move on. Mechanical! Fuuuuck!😫😫😫😫 The problem is what you don’t walk away with. You leave with a toolkit, but without a feel for what a differential equation even is, especially geometrically. And that’s a big deal, because in real modeling the equations you meet are rarely nice enough to reward memorized recipes. So you end up trained to solve toy forms, while the actual subject...the behavior, the flow, the shape of solutions stays blurry. This is why I’m biased toward the old-timers. Their old-school way of doing things always surprises me:...they’ll spend time on one idea until it sticks, instead of sprinting through a syllabus checklist. One lecture from them and you start noticing a contrast. A lot of modern teaching feels like "finish the content,". You get marched through techniques, but you’re not left with a single thought that keeps bothering you later...the kind of thought that actually pushes you toward research-level curiosity. MIT OpenCourseWare’s Professor Arthur Mattuck did that to me in his very first ODE lecture. One lecture, and your whole relationship with dy/dx = f(x,y) changes. In this segment, Prof. Mattuck is basically saying: A first-order ODE is a slope field, and a solution is a curve that moves everywhere tangent to that field. The math breakdown Write the ODE as dy/dx = f(x,y). At each point (x,y) you attach a tiny line segment with slope f(x,y). A function y = y₁(x) is a solution exactly when its graph follows those slopes:. At every x, the slope of the curve equals the slope prescribed by the field at the point on the curve. That’s the single line that unifies both viewpoints: y₁′(x) = f(x, y₁(x)). So solving the ODE and drawing an integral curve are the same statement in two languages!👌🏻 Once you see that, you can stop obsessing over whether you can write y(x) in closed form. You can start asking the questions that matter: where do solutions flow, where do they get trapped, where do they blow up, and where does existence/uniqueness fail just because the field isn’t even defined? That’s the perspective shift I wish every ODE course forces early and it’s exactly why I keep pairing math with animation. #DifferentialEquations #ODEs #VectorFields #MathAnimation #Mathematics

When I first took ordinary differential equations, the pattern was always the same. Week 1 turns into a speedrun of methods: separation of variables, integrating factors, variation of parameters, Bernoulli, exact equations… and by Week 2 or 3 the course has quietly degenerated into hammer-picking. Spot the form, apply the recipe, move on. Mechanical! Fuuuuck!😫😫😫😫 The problem is what you don’t walk away with. You leave with a toolkit, but without a feel for what a differential equation even is, especially geometrically. And that’s a big deal, because in real modeling the equations you meet are rarely nice enough to reward memorized recipes. So you end up trained to solve toy forms, while the actual subject...the behavior, the flow, the shape of solutions stays blurry. This is why I’m biased toward the old-timers. Their old-school way of doing things always surprises me:...they’ll spend time on one idea until it sticks, instead of sprinting through a syllabus checklist. One lecture from them and you start noticing a contrast. A lot of modern teaching feels like "finish the content,". You get marched through techniques, but you’re not left with a single thought that keeps bothering you later...the kind of thought that actually pushes you toward research-level curiosity. MIT OpenCourseWare’s Professor Arthur Mattuck did that to me in his very first ODE lecture. One lecture, and your whole relationship with dy/dx = f(x,y) changes. In this segment, Prof. Mattuck is basically saying: A first-order ODE is a slope field, and a solution is a curve that moves everywhere tangent to that field. The math breakdown Write the ODE as dy/dx = f(x,y). At each point (x,y) you attach a tiny line segment with slope f(x,y). A function y = y₁(x) is a solution exactly when its graph follows those slopes:. At every x, the slope of the curve equals the slope prescribed by the field at the point on the curve. That’s the single line that unifies both viewpoints: y₁′(x) = f(x, y₁(x)). So solving the ODE and drawing an integral curve are the same statement in two languages!👌🏻 Once you see that, you can stop obsessing over whether you can write y(x) in closed form. You can start asking the questions that matter: where do solutions flow, where do they get trapped, where do they blow up, and where does existence/uniqueness fail just because the field isn’t even defined? That’s the perspective shift I wish every ODE course forces early and it’s exactly why I keep pairing math with animation. #DifferentialEquations #ODEs #VectorFields #MathAnimation #Mathematics

Mathelirium

53,338 views • 5 months ago

WHY AM I THE SHAPE OF A T? MEGA64 x KEITA TAKAHASHI “TO A T” OUT NOW ON XBOX SERIES X/S, PS5, and PC

WHY AM I THE SHAPE OF A T? MEGA64 x KEITA TAKAHASHI “TO A T” OUT NOW ON XBOX SERIES X/S, PS5, and PC

Mega64 @ Anime Expo (KH-1042)

165,116 views • 1 year ago