The Biological Neuron
Neurons & Perceptrons
What is a Neural Network?
A neural network is a computational model loosely inspired by the structure of the human brain. It consists of layers of interconnected processing units called neurons (or nodes), each of which receives numerical inputs, applies a weighted transformation, and passes the result through a non-linear activation function. By stacking many such layers and training on data, a neural network can learn arbitrarily complex mappings from inputs to outputs — without being explicitly programmed with rules.
The simplest neural network unit is the perceptron, proposed by Frank Rosenblatt in 1958. Understanding the perceptron is the foundation for everything in deep learning.
The Biological Inspiration
Your brain has ~86 billion neurons. Each receives signals, processes them, and fires a signal to other neurons.
Dendrites (inputs) → Cell Body (process) → Axon (output) → Synapses (to other neurons)
The Artificial Neuron mimics this:
- Inputs: Multiple values (x₁, x₂, x₃, ...)
- Weights: Strength of each connection (w₁, w₂, w₃, ...)
- Bias: Threshold to fire (b)
- Activation: Fire or not based on computation
- Output: Single value
The Perceptron Formula
output = activation(w₁×x₁ + w₂×x₂ + w₃×x₃ + ... + b)
In matrix form:
output = activation(w·x + b)
Where:
- w = weight vector (parameters the network learns)
- x = input vector
- b = bias term
- activation = non-linear function (ReLU, Sigmoid, etc.)
Example: Credit Card Approval
Inputs:
- Income (x₁)
- Credit score (x₂)
- Years employed (x₃)
Weights (learned from past data):
- Income: w₁ = 0.003 (higher income = more likely approved)
- Credit score: w₂ = 0.02 (higher score = more likely approved)
- Years employed: w₃ = 0.5 (longer employment = more likely approved)
Bias: b = -2 (need strong signals to approve)
z = 0.003×50000 + 0.02×750 + 0.5×5 - 2
z = 150 + 15 + 2.5 - 2 = 165.5
If activation(165.5) > 0.5: APPROVE
Else: REJECT
Why Non-Linear Activation?
# Without activation (linear)
y = w·x + b
# This is just a line! Can only model linear relationships.
# Multiple layers of lines = still just a line (linear combination of linear = linear)
# With activation (non-linear)
y = ReLU(w·x + b)
y = max(0, w·x + b)
# Now we can model any function! (universal approximation theorem)
This is why activation functions are crucial.