Page7/22
Data Distributions & Normality · Page 1 of 1
Common Distributions
Data Distributions
Why Distributions Matter
Many algorithms assume data is normally distributed:
- Linear/Logistic Regression
- Naive Bayes
- PCA
- Gaussian Mixture Models
If your data is highly skewed, these algorithms perform poorly!
The Normal Distribution (Bell Curve)
μ = mean
σ = standard deviation
68% of data within ±1σ
95% of data within ±2σ
99.7% of data within ±3σ
Skewed Distributions
Right-Skewed (Positive Skew)
- Long tail on the right
- Mean > Median
- Common in: Income, house prices, wait times
- Fix: Log transform, square root transform
Left-Skewed (Negative Skew)
- Long tail on the left
- Mean < Median
- Common in: Test scores (people cluster towards 100%)
- Fix: Box-Cox transform
Bimodal & Multimodal
Multiple peaks indicate hidden groups in your data.
- Example: "Height of humans" has two peaks (men & women).
- Solution: Stratify by group or add a categorical feature.
main.py
Loading...
OUTPUT
▶Click "Run Code" to execute…