7/22
Data Distributions & Normality · Page 1 of 1

Common Distributions

Data Distributions

Why Distributions Matter

Many algorithms assume data is normally distributed:

  • Linear/Logistic Regression
  • Naive Bayes
  • PCA
  • Gaussian Mixture Models

If your data is highly skewed, these algorithms perform poorly!

The Normal Distribution (Bell Curve)

μ = mean
σ = standard deviation
68% of data within ±1σ
95% of data within ±2σ
99.7% of data within ±3σ

Skewed Distributions

Right-Skewed (Positive Skew)

  • Long tail on the right
  • Mean > Median
  • Common in: Income, house prices, wait times
  • Fix: Log transform, square root transform

Left-Skewed (Negative Skew)

  • Long tail on the left
  • Mean < Median
  • Common in: Test scores (people cluster towards 100%)
  • Fix: Box-Cox transform

Bimodal & Multimodal

Multiple peaks indicate hidden groups in your data.

  • Example: "Height of humans" has two peaks (men & women).
  • Solution: Stratify by group or add a categorical feature.
main.py
Loading...
OUTPUT
Click "Run Code" to execute…