Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

all the parameters (weights and biases) of the same multilayer perception over 200 epochs red/blue/yellow-three layers The top right knob in each layer being the weight

Kat ⊷ the Poet Engineer

36,692 subscribers

15,851 views • 1 year ago •via X (Twitter)

Science & Technology Arts

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

Tonight, the Empire State Building will shine in blue, yellow, and red in celebration of 'SUPERMAN'.

Tonight, the Empire State Building will shine in blue, yellow, and red in celebration of 'SUPERMAN'.

DC Film News

23,602 views • 11 months ago

Zooming in to check the weight of the weights

Zooming in to check the weight of the weights

The Best Viral Vids🔥

62,722 views • 5 months ago

Zooming in to check the weight of the weights 👀😉

Zooming in to check the weight of the weights 👀😉

Fun Viral Vids 😊

85,884 views • 7 months ago

Full Fine-tuning vs. Freezing Layers. Interact 👉 and == Full Fine-tuning == A real network has many — three layers in this example, billions of parameters in a production model. What does fine-tuning look like when you update all of them? That’s full fine-tuning: continue training every weight in the pretrained network on your new task. Every layer’s W gets its own ΔW. Nothing is frozen — every parameter is in play. Think of an MLP as a chain of prerequisites leading to an advanced course. Layer 1 might be Linear Algebra, layer 2 Probability, layer 3 Advanced Machine Learning — each one building on what came before. Fine-tuning is what happens during graduate study: the foundations are already there from undergrad, so you’re not re-learning. Full fine-tuning is reviewing every prerequisite to see what new topics have appeared and what discoveries the field has made since the last time you sat through them. Effective — but exhausting. This diagram shows the same three-layer MLP twice, side by side. On the left, the pretrained network runs on input X: three weight matrices W₁, W₂, W₃, each followed by a ReLU activation. Full fine-tuning gives the model the most freedom to specialize. Every parameter can move — and every parameter that can move must be stored. But not every prerequisite needs revisiting. The further you go back in the chain, the less the material has changed since pretraining — the linear-algebra basics under your computer-vision course are largely the same as they ever were. The next page does exactly that: freeze the prerequisites that haven’t moved, and only refresh the advanced one closest to your specialization. == Freezing Layers == Full fine-tuning reviewed every prerequisite — Linear Algebra, Probability, Advanced ML — to refresh each subject with the latest topics. Effective, but exhausting. Then you realize something. The prerequisites haven’t actually changed that much. Linear Algebra is still Linear Algebra; the matrix decompositions you learned still hold. Probability is still Probability; the distributions and Bayes’ rule haven’t moved. Almost all the new material — the new ideas, the recent discoveries — lives in the advanced layer at the top. That’s freezing layers: keep the prerequisite layers fixed at their pretrained state, and only update the advanced one. In the diagram below, W1 and W2 — the foundational prerequisites — stay frozen. Only W3 — the layer closest to your task-specific output — gets a ΔW.

Full Fine-tuning vs. Freezing Layers. Interact 👉 and == Full Fine-tuning == A real network has many — three layers in this example, billions of parameters in a production model. What does fine-tuning look like when you update all of them? That’s full fine-tuning: continue training every weight in the pretrained network on your new task. Every layer’s W gets its own ΔW. Nothing is frozen — every parameter is in play. Think of an MLP as a chain of prerequisites leading to an advanced course. Layer 1 might be Linear Algebra, layer 2 Probability, layer 3 Advanced Machine Learning — each one building on what came before. Fine-tuning is what happens during graduate study: the foundations are already there from undergrad, so you’re not re-learning. Full fine-tuning is reviewing every prerequisite to see what new topics have appeared and what discoveries the field has made since the last time you sat through them. Effective — but exhausting. This diagram shows the same three-layer MLP twice, side by side. On the left, the pretrained network runs on input X: three weight matrices W₁, W₂, W₃, each followed by a ReLU activation. Full fine-tuning gives the model the most freedom to specialize. Every parameter can move — and every parameter that can move must be stored. But not every prerequisite needs revisiting. The further you go back in the chain, the less the material has changed since pretraining — the linear-algebra basics under your computer-vision course are largely the same as they ever were. The next page does exactly that: freeze the prerequisites that haven’t moved, and only refresh the advanced one closest to your specialization. == Freezing Layers == Full fine-tuning reviewed every prerequisite — Linear Algebra, Probability, Advanced ML — to refresh each subject with the latest topics. Effective, but exhausting. Then you realize something. The prerequisites haven’t actually changed that much. Linear Algebra is still Linear Algebra; the matrix decompositions you learned still hold. Probability is still Probability; the distributions and Bayes’ rule haven’t moved. Almost all the new material — the new ideas, the recent discoveries — lives in the advanced layer at the top. That’s freezing layers: keep the prerequisite layers fixed at their pretrained state, and only update the advanced one. In the diagram below, W1 and W2 — the foundational prerequisites — stay frozen. Only W3 — the layer closest to your task-specific output — gets a ΔW.

Tom Yeh

27,225 views • 2 months ago

Three cheers for the red, white, and blue!

Three cheers for the red, white, and blue!

The Louden Tavern

203,745 views • 2 years ago

SO | INF | Taber Stokes (Hutchinson Blue Dragon Baseball) Table setter at the top of the order, has been all over the barrel today for the Blue Dragons. Sends one to the track in right center for a 2 RBI 3B here. #KCJUCO

SO | INF | Taber Stokes (Hutchinson Blue Dragon Baseball) Table setter at the top of the order, has been all over the barrel today for the Blue Dragons. Sends one to the track in right center for a 2 RBI 3B here. #KCJUCO

Prep Baseball JUCO

14,907 views • 1 year ago

the styling team had to layer three diapers on darlene and myki during the performance because they kept SHITTING in the mother toiler during rehearsals, leaving brown stains all over the all white stage

the styling team had to layer three diapers on darlene and myki during the performance because they kept SHITTING in the mother toiler during rehearsals, leaving brown stains all over the all white stage

eden ☆

98,376 views • 3 months ago

Make full use of layer functions! Discover how to use clipping masks, layer masks, and reference layers effectively in your work. This concise guide has all the essential information you need for using layers and coloring effectively. Learn more here↓ #clipstudio

Make full use of layer functions! Discover how to use clipping masks, layer masks, and reference layers effectively in your work. This concise guide has all the essential information you need for using layers and coloring effectively. Learn more here↓ #clipstudio

CLIP STUDIO PAINT

12,493 views • 1 year ago

Three different ways of beating the defender, three different shifts onto the left foot… but the same outcome: a near-side rocket over the keeper’s shoulder. Kvaratskhelia, Rashford, Nico Williams — all hit the crossbar-and-in. Coaching detail below👇

Three different ways of beating the defender, three different shifts onto the left foot… but the same outcome: a near-side rocket over the keeper’s shoulder. Kvaratskhelia, Rashford, Nico Williams — all hit the crossbar-and-in. Coaching detail below👇

The Individual Coach-Analyst

35,856 views • 7 months ago

The last scene in Red vs Blue. Ever. Of all time.

The last scene in Red vs Blue. Ever. Of all time.

No Context Red vs Blue

65,520 views • 2 years ago

The entire building block of every neural network. 1. Dot product the inputs with their weights 2. Add a bias 3. Apply a nonlinearity Stack millions of these into layers, add attention, scale to hundreds of billions of parameters, and you get something like Grok. MIT 6.S191

The entire building block of every neural network. 1. Dot product the inputs with their weights 2. Add a bias 3. Apply a nonlinearity Stack millions of these into layers, add attention, scale to hundreds of billions of parameters, and you get something like Grok. MIT 6.S191

tetsuo

16,672 views • 1 month ago

$[Backpropagation] by Hand✍️ [1] Forward Pass ↳ Given a multi layer perceptron (3 levels), an input vector X, predictions Y^{Pred} = [0.5, 0.5, 0], and ground truth label Y^{Target} = [0, 1, 0]. [2] Backpropagation ↳ Insert cells to hold our calculations. [3] Layer 3 - Softmax (blue) ↳ Calculate ∂L / ∂z3 directly using the simple equation: Y^{Pred} - Y^{Target} = [0.5, -0.5, 0]. ↳ This simple equation is the benefit of using Softmax and Cross Entropy Loss together. [4] Layer 3 - Weights (orange) & Biases (black) ↳ Calculate ∂L / ∂W3 and ∂L / ∂b3 by multiplying ∂L / ∂z3 and [ a2 | 1 ]. [5] Layer 2 - Activations (green) ↳ Calculate ∂L / ∂a2 by multiplying ∂L / ∂z3 and W3. [6] Layer 2 - ReLU (blue) ↳ Calculate ∂L / ∂z2 by multiplying ∂L / ∂a2 with 1 for positive values and 0 otherwise. [7] Layer 2 - Weights (orange) & Biases (black) ↳ Calculate ∂L / ∂W2 and ∂L / ∂b2 by multiplying ∂L / ∂z2 and [ a1 | 1 ]. [8] Layer 1 - Activations (green) ↳ Calculate ∂L / ∂a1 by multiplying ∂L / ∂z2 and W2. [9] Layer 1 - ReLU (blue) ↳ Calculate ∂L / ∂z1 by multiplying ∂L / ∂a1 with 1 for positive values and 0 otherwise. [10] Layer 1 - Weights (orange) & Biases (black) ↳ Calculate ∂L / ∂W1 and ∂L / ∂b1 by multiplying ∂L / ∂z1 and [ x | 1 ]. [11] Gradient Descent ↳ Update weights and biases (typically a learning rate is applied here). 💡 Matrix Multiplication is All You Need: Just like in the forward pass, backpropagation is all about matrix multiplications. You can definitely do everything by hand as I demonstrated in this exercise, albeit slow and imperfect. This is why GPU's ability to multiply matrices efficiently plays such an important role in the deep learning evolution. This is why NVIDIA is now close to $1 trillion in valuation. 💡Exploding Gradients: We can already see the gradients are getting larger as we back-propagate up, even in this simple 3-layer network. This motivates using methods like skip connections to handle exploding (or diminishing) gradients as in the ResNet. I did the calculations entirely by hand. Please let me know if you spot any error or have any questions!$

[Backpropagation] by Hand✍️ [1] Forward Pass ↳ Given a multi layer perceptron (3 levels), an input vector X, predictions Y^{Pred} = [0.5, 0.5, 0], and ground truth label Y^{Target} = [0, 1, 0]. [2] Backpropagation ↳ Insert cells to hold our calculations. [3] Layer 3 - Softmax (blue) ↳ Calculate ∂L / ∂z3 directly using the simple equation: Y^{Pred} - Y^{Target} = [0.5, -0.5, 0]. ↳ This simple equation is the benefit of using Softmax and Cross Entropy Loss together. [4] Layer 3 - Weights (orange) & Biases (black) ↳ Calculate ∂L / ∂W3 and ∂L / ∂b3 by multiplying ∂L / ∂z3 and [ a2 | 1 ]. [5] Layer 2 - Activations (green) ↳ Calculate ∂L / ∂a2 by multiplying ∂L / ∂z3 and W3. [6] Layer 2 - ReLU (blue) ↳ Calculate ∂L / ∂z2 by multiplying ∂L / ∂a2 with 1 for positive values and 0 otherwise. [7] Layer 2 - Weights (orange) & Biases (black) ↳ Calculate ∂L / ∂W2 and ∂L / ∂b2 by multiplying ∂L / ∂z2 and [ a1 | 1 ]. [8] Layer 1 - Activations (green) ↳ Calculate ∂L / ∂a1 by multiplying ∂L / ∂z2 and W2. [9] Layer 1 - ReLU (blue) ↳ Calculate ∂L / ∂z1 by multiplying ∂L / ∂a1 with 1 for positive values and 0 otherwise. [10] Layer 1 - Weights (orange) & Biases (black) ↳ Calculate ∂L / ∂W1 and ∂L / ∂b1 by multiplying ∂L / ∂z1 and [ x | 1 ]. [11] Gradient Descent ↳ Update weights and biases (typically a learning rate is applied here). 💡 Matrix Multiplication is All You Need: Just like in the forward pass, backpropagation is all about matrix multiplications. You can definitely do everything by hand as I demonstrated in this exercise, albeit slow and imperfect. This is why GPU's ability to multiply matrices efficiently plays such an important role in the deep learning evolution. This is why NVIDIA is now close to $1 trillion in valuation. 💡Exploding Gradients: We can already see the gradients are getting larger as we back-propagate up, even in this simple 3-layer network. This motivates using methods like skip connections to handle exploding (or diminishing) gradients as in the ResNet. I did the calculations entirely by hand. Please let me know if you spot any error or have any questions!

Tom Yeh

64,645 views • 1 year ago

Right in front of Lisa Bluder and Jan Jensen… Kate Martin checks in for the first time, joining Caitlin Clark and Megan Gustafson. All three Hawkeyes on the floor, at the same time. A memorable moment.

Right in front of Lisa Bluder and Jan Jensen… Kate Martin checks in for the first time, joining Caitlin Clark and Megan Gustafson. All three Hawkeyes on the floor, at the same time. A memorable moment.

Ben Stevens

499,693 views • 2 years ago

They killed each other at the same second and we won because of blue star 🤯

They killed each other at the same second and we won because of blue star 🤯

HMB Symantec

20,755 views • 1 year ago

and he leads the league in FG% in each of the last three seasons

and he leads the league in FG% in each of the last three seasons

New York Jets

24,599 views • 5 months ago

all of them actually being in the same room..

all of them actually being in the same room..

AARON

5,871,657 views • 2 years ago

Tatiana Maslany playing all of these characters IN THE SAME SHOW, and they all interact with each other

Tatiana Maslany playing all of these characters IN THE SAME SHOW, and they all interact with each other

Laurine /\\\

50,269 views • 1 year ago

How is the clear blue skies being blocked? With by geoengineered cloud layers.!

How is the clear blue skies being blocked? With by geoengineered cloud layers.!

Gök Medresesi

26,001 views • 7 months ago

At the Burj Khalifa lit up in Red, White, and Blue😂 collab w/The Right To Bear Memes & Lauren3ve

At the Burj Khalifa lit up in Red, White, and Blue😂 collab w/The Right To Bear Memes & Lauren3ve

drefanzor memes

24,980 views • 1 year ago