2/20
Neurons & Perceptrons — Building Blocks · Page 2 of 2

Activation Functions

Activation Functions (Non-Linearity)

Activation functions introduce non-linearity, allowing networks to learn complex patterns.

ReLU (Rectified Linear Unit) — Most Popular

f(z) = max(0, z)

Advantages:

  • Computationally efficient
  • Works great in practice
  • Sparse activation (many zeros)

Disadvantage:

  • Dead neurons (if w, b cause z < 0 always, neuron stops learning)

Sigmoid — Classic but Outdated

f(z) = 1 / (1 + e^(-z))

Output range: (0, 1)

Why it was used:

  • Smooth, differentiable
  • Output interpretable as probability

Why we moved away:

  • Vanishing gradients (near 0 or 1, gradient ≈ 0, hard to learn)
  • Slower than ReLU

Tanh — Improved Sigmoid

f(z) = (e^z - e^(-z)) / (e^z + e^(-z))

Output range: (-1, 1)

Better than Sigmoid but still slower than ReLU.

Softmax — Multi-class Classification

f(zᵢ) = e^(zᵢ) / Σⱼ e^(zⱼ)

Converts raw scores to probabilities (sum to 1).

Example:

  • Raw output: [2.0, 1.0, 0.1]
  • After softmax: [0.7, 0.2, 0.1] ← probabilities!

When to Use

TaskLast LayerHidden Layers
Binary ClassificationSigmoidReLU
Multi-classSoftmaxReLU
RegressionLinearReLU
SequencesSigmoid/TanhReLU

Modern best practice: Use ReLU in hidden layers, specific activation for output.

main.py
Loading...
OUTPUT
Click "Run Code" to execute…