Page2/20
Neurons & Perceptrons — Building Blocks · Page 2 of 2
Activation Functions
Activation Functions (Non-Linearity)
Activation functions introduce non-linearity, allowing networks to learn complex patterns.
ReLU (Rectified Linear Unit) — Most Popular
f(z) = max(0, z)
Advantages:
- Computationally efficient
- Works great in practice
- Sparse activation (many zeros)
Disadvantage:
- Dead neurons (if w, b cause z < 0 always, neuron stops learning)
Sigmoid — Classic but Outdated
f(z) = 1 / (1 + e^(-z))
Output range: (0, 1)
Why it was used:
- Smooth, differentiable
- Output interpretable as probability
Why we moved away:
- Vanishing gradients (near 0 or 1, gradient ≈ 0, hard to learn)
- Slower than ReLU
Tanh — Improved Sigmoid
f(z) = (e^z - e^(-z)) / (e^z + e^(-z))
Output range: (-1, 1)
Better than Sigmoid but still slower than ReLU.
Softmax — Multi-class Classification
f(zᵢ) = e^(zᵢ) / Σⱼ e^(zⱼ)
Converts raw scores to probabilities (sum to 1).
Example:
- Raw output: [2.0, 1.0, 0.1]
- After softmax: [0.7, 0.2, 0.1] ← probabilities!
When to Use
| Task | Last Layer | Hidden Layers |
|---|---|---|
| Binary Classification | Sigmoid | ReLU |
| Multi-class | Softmax | ReLU |
| Regression | Linear | ReLU |
| Sequences | Sigmoid/Tanh | ReLU |
Modern best practice: Use ReLU in hidden layers, specific activation for output.
main.py
Loading...
OUTPUT
▶Click "Run Code" to execute…