Page4/20
Forward & Backpropagation — How Networks Learn · Page 2 of 2
Backpropagation & Gradient Descent
Backpropagation (The Learning Algorithm)
Goal: Find weights that minimize loss.
Strategy: Compute gradients (how much to adjust each weight) and update:
1. Forward pass: compute loss
2. Backward pass: compute dL/dW for each weight
3. Update: W_new = W_old - learning_rate × dL/dW
The Chain Rule (Calculus)
To find dL/dW, use chain rule:
dL/dW = (dL/da2) × (da2/dz2) × (dz2/dW2)
Where:
- dL/da2 = how much does loss depend on output?
- da2/dz2 = how much does output depend on pre-activation?
- dz2/dW2 = how much does pre-activation depend on weights?
Gradient Descent
Update rule:
W := W - α × ∇W Loss
Where:
- α = learning rate (how big a step to take)
- ∇W = gradient (computed by backprop)
Learning rate choices:
- Too high (α = 1.0): Overshoot, diverge, unstable
- Too low (α = 0.00001): Learn very slowly
- Just right (α = 0.01): Stable, fast learning
Example Update
dL/dW1 = 0.05 (gradient for weight 1)
α = 0.01 (learning rate)
W1_old = 0.3
W1_new = 0.3 - 0.01 × 0.05 = 0.3 - 0.0005 = 0.2995
W1 moved slightly in direction to reduce loss!
Repeat This Process
for epoch in range(1000):
# Forward: compute loss
predictions = network.forward(X)
loss = compute_loss(y, predictions)
# Backward: compute gradients
gradients = network.backward()
# Update: move in negative gradient direction
network.update_weights(gradients, learning_rate=0.01)
# After 1000 iterations: weights converge to good values!
This is how all neural networks learn! 🧠
main.py
Loading...
OUTPUT
▶Click "Run Code" to execute…