4/20
Forward & Backpropagation — How Networks Learn · Page 2 of 2

Backpropagation & Gradient Descent

Backpropagation (The Learning Algorithm)

Goal: Find weights that minimize loss.

Strategy: Compute gradients (how much to adjust each weight) and update:

1. Forward pass: compute loss
2. Backward pass: compute dL/dW for each weight
3. Update: W_new = W_old - learning_rate × dL/dW

The Chain Rule (Calculus)

To find dL/dW, use chain rule:

dL/dW = (dL/da2) × (da2/dz2) × (dz2/dW2)

Where:

  • dL/da2 = how much does loss depend on output?
  • da2/dz2 = how much does output depend on pre-activation?
  • dz2/dW2 = how much does pre-activation depend on weights?

Gradient Descent

Update rule:

W := W - α × ∇W Loss

Where:
- α = learning rate (how big a step to take)
- ∇W = gradient (computed by backprop)

Learning rate choices:

  • Too high (α = 1.0): Overshoot, diverge, unstable
  • Too low (α = 0.00001): Learn very slowly
  • Just right (α = 0.01): Stable, fast learning

Example Update

dL/dW1 = 0.05   (gradient for weight 1)
α = 0.01        (learning rate)

W1_old = 0.3
W1_new = 0.3 - 0.01 × 0.05 = 0.3 - 0.0005 = 0.2995

W1 moved slightly in direction to reduce loss!

Repeat This Process

for epoch in range(1000):
    # Forward: compute loss
    predictions = network.forward(X)
    loss = compute_loss(y, predictions)
    
    # Backward: compute gradients
    gradients = network.backward()
    
    # Update: move in negative gradient direction
    network.update_weights(gradients, learning_rate=0.01)
    
    # After 1000 iterations: weights converge to good values!

This is how all neural networks learn! 🧠

main.py
Loading...
OUTPUT
Click "Run Code" to execute…