Page18/20
Weight Initialization, Regularization & Dropout Β· Page 2 of 2
Dropout & L1/L2 Regularization
Dropout (Simple but Effective)
Problem: Model memorizes training data (overfitting).
Solution: Randomly drop neurons during training!
Forward pass:
y = Dense(x) (normal)
With dropout (p=0.5):
mask = random([0, 1]) (50% zeros)
y = Dense(x) * mask (drop 50% of outputs)
Then scale: y = y / (1 - p) (compensate for dropped units)
Test time: Use all neurons! No dropout.
Why it works:
- Forces network to learn redundant features
- Can't rely on single neuron
- Ensemble effect (different neurons active each batch)
Typical dropout rates:
- p=0.2-0.3 (light, 20-30% drop)
- p=0.5 (standard)
- p > 0.7 (heavy, for very large networks)
L1/L2 Regularization
Idea: Penalize large weights β Force small, sparse weights.
L2 Regularization (Ridge)
Total Loss = Data Loss + Ξ» Γ Ξ£(wΒ²)
Ξ» controls strength (hyperparameter)
Gradient: dL/dw = (normal gradient) + 2Ξ»w
Large w β bigger penalty β decay toward 0
Effect: All weights shrink uniformly.
L1 Regularization (Lasso)
Total Loss = Data Loss + Ξ» Γ Ξ£(|w|)
Gradient: dL/dw = (normal gradient) + Ξ» Γ sign(w)
Drives less-important weights to exactly 0!
Effect: Feature selection (some weights exactly 0).
Regularization Strength
Ξ» = 0: No regularization (overfit)
Ξ» = 0.001: Light regularization (good balance)
Ξ» = 0.1: Strong regularization (underfit)
Ξ» = 1.0: Very strong (model too simple)
Tuning: Use validation set to find best Ξ».
Combining Techniques
Best practice:
Layer 1: Dense β BatchNorm β Activation β Dropout
Layer 2: Dense β BatchNorm β Activation β Dropout
...
When to Use
| Technique | Use For | Strength |
|---|---|---|
| Dropout | Large networks | Simple, effective |
| L2 Reg | All models | Standard |
| L1 Reg | Feature selection | Interpretability |
| Batch Norm | Deep networks | Stabilizes training |
| Early stopping | General | Prevents overfitting |
Early Stopping
Simplest regularization:
Train until validation loss stops improving
Stop and use that model
Why? Prevents overfitting!
main.py
Loading...
OUTPUT
βΆClick "Run Code" to executeβ¦