16/20
Support Vector Machines (SVM) Β· Page 1 of 1

The Maximum Margin Principle

Support Vector Machines (SVM)

The Core Idea

SVM finds the best line (or hyperplane) that separates two classes with the maximum margin (distance to nearest points).

Intuition

Class A:  ●●●  ___________  β—‹β—‹β—‹  : Class B

The line position matters!
Too close to A? Will misclassify new A points.
Too close to B? Will misclassify new B points.
SVM finds the perfect balance (maximum margin).

Linear vs Non-Linear

Linear SVM

For linearly separable data:

2x + 3y + 1 = 0  (decision boundary)

Find weights (w) and bias (b) such that margin is maximized.

Non-Linear SVM (Kernel Trick)

For non-linear data (spirals, circles), use kernels to transform to higher dimensions:

Original 2D data (not separable)
   ↓ (Kernel transformation)
Higher dimension (separable)

Common Kernels:

  • Linear: For linearly separable data
  • RBF (Radial Basis Function): Most popular, handles most cases
  • Polynomial: Useful for polynomial relationships
  • Sigmoid: Similar to neural networks

Support Vectors

Points closest to the decision boundary. Only these matter!

  • If you move a far-away point, decision boundary doesn't change
  • If you move a support vector, boundary shifts
  • SVM is efficient: stores only support vectors (often small fraction of data)

Pros & Cons

Pros:

  • βœ“ Works well on tabular data
  • βœ“ Non-linear kernels handle complex boundaries
  • βœ“ Memory efficient (only stores support vectors)
  • βœ“ Works well in high-dimensional spaces

Cons:

  • βœ— Slow on large datasets (O(nΒ²) or worse)
  • βœ— Hard to interpret which features matter
  • βœ— Sensitive to feature scaling (must normalize!)
  • βœ— Hyperparameter tuning critical (C, gamma)
main.py
Loading...
OUTPUT
β–ΆClick "Run Code" to execute…