Module

Machine Learning Fundamentals

Progress95%

19 / 20 pages

Lesson 1: What is Machine Learning?

Lesson 2: Linear Regression from Scratch

Lesson 3: Visualizing the Loss Landscape

Lesson 4: Logistic Regression (Classification)

Lesson 5: K-Nearest Neighbors (Distance)

Lesson 6: Evaluation Metrics (From Scratch)

Lesson 7: Unsupervised Learning & K-Means

Lesson 8: Dimensionality Reduction with PCA

Lesson 9: Decision Trees & Splits

Lesson 10: Regularization (L1 & L2)

Lesson 11: K-Fold Cross Validation

Lesson 12: Naive Bayes — Probabilistic Classifier

Lesson 13: Support Vector Machines (SVM)

Lesson 14: Gradient Boosting & AdaBoost

Lesson 15: DBSCAN — Density-Based Clustering

Lesson 16: Gaussian Mixture Models (GMM)

Lesson 17: Ensemble Methods — Combine Multiple Models

Back to Module Overview

Alt+←/→to navigatePage19/2095

Gaussian Mixture Models (GMM) · Page 1 of 1

Soft Clustering with GMM

24 min Advanced

Gaussian Mixture Models (GMM)

Hard vs Soft Clustering

Hard Clustering (K-Means, DBSCAN)

Each point belongs to exactly ONE cluster.

Point: [5.1, 3.5] → Cluster 0 (100%)

Soft Clustering (GMM)

Each point has PROBABILITY of belonging to each cluster.

Point: [5.1, 3.5] → 70% Cluster 0, 30% Cluster 1

The Model

Assume each cluster is a Gaussian distribution (bell curve):

Cluster A: Mean=μ_A, Covariance=Σ_A
Cluster B: Mean=μ_B, Covariance=Σ_B
...

A data point is sampled from one of these Gaussians!

Graphically:

Two overlapping bell curves.
Point near the overlap belongs to both with high probability.

EM Algorithm (Expectation-Maximization)

Initialization: Randomly place K Gaussians
E-step (Expectation): For each point, calculate probability of belonging to each Gaussian
M-step (Maximization): Update Gaussian parameters (μ, Σ) based on probabilities
Repeat until convergence

Why it works:

E-step: "Which cluster is this point from?"
M-step: "Refit each cluster to its assigned points"
Iterate until stable

Advantages & Disadvantages

Pros:

✓ Probabilistic (know confidence)
✓ Can handle overlapping clusters
✓ More flexible than K-Means
✓ Theoretical foundation

Cons:

✗ Assumes Gaussian shape (may not hold)
✗ Sensitive to number of components (K)
✗ Slower than K-Means
✗ Can get stuck in local optima

Choosing Number of Clusters

Use AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion):

Train GMM with K=1,2,3,... up to 10
Calculate AIC/BIC for each
Lower is better
Pick K with lowest BIC

GMM vs K-Means vs DBSCAN

Aspect	K-Means	GMM	DBSCAN
Soft clusters?	No	Yes	No
Assumes shape	Spherical	Gaussian	Any
Speed	Fast	Medium	Slow
K needed?	Yes	Yes	No
Interpretability	High	Medium	Low
Output	Labels	Probabilities	Labels+Noise

main.py

OUTPUT

▶Click "Run Code" to execute…