Module

Advanced ML & Model Interpretability

Progress36%

8 / 22 pages

Lesson 1: Advanced Evaluation Metrics

Lesson 2: Stratified K-Fold Cross-Validation

Lesson 3: SHAP (SHapley Additive exPlanations)

Lesson 4: LIME (Local Interpretable Model-agnostic Explanations)

Lesson 5: Data Distributions & Normality

Lesson 6: Feature Scaling & Normalization

Lesson 7: Handling Class Imbalance

Lesson 8: Hyperparameter Tuning (Grid & Random Search)

Lesson 9: Feature Engineering — Create Better Features

Lesson 10: XGBoost — The Best Algorithm

Lesson 11: Advanced Ensemble Methods

Lesson 12: Introduction to Neural Networks

Lesson 13: Model Deployment & Production

Lesson 14: Model Monitoring & Drift Detection

Lesson 15: ML Ethics & Fairness

Lesson 16: Time Series Basics

Lesson 17: Causal Inference & A/B Testing

Lesson 18: Model Calibration & Probability Estimates

Back to Module Overview

Alt+←/→to navigatePage8/2236

Feature Scaling & Normalization · Page 1 of 1

Why Scale Features?

22 min Advanced

Feature Scaling

The Problem

Some features have large ranges:

Age: 0-100
Income: $0 - $1,000,000
Temperature: -50 to 50°C

Others have small ranges:

Rating: 0-5

Distance-based algorithms (KNN, K-Means) and Gradient Descent see large-range features as "more important" just because of scale, not signal!

Solution: Normalize Features

StandardScaler (Standardization)

z = (x - mean) / std_dev

Centers data at 0, scales to std dev 1
Result: Normally distributed between -3 and 3
Use for: Algorithms assuming normal distribution (Linear/Logistic Regression, Neural Networks)

MinMaxScaler (Normalization)

scaled = (x - min) / (max - min)

Scales to [0, 1]
Use for: Tree-based models (don't need it, but doesn't hurt), NN activation functions expecting [0,1]

RobustScaler

scaled = (x - median) / IQR

Uses median and IQR instead of mean/std
Use for: Data with extreme outliers

When NOT to Scale

✗ Tree-based models (Decision Trees, Random Forest, XGBoost) — They're scale-invariant ✓ Distance-based algorithms (KNN, K-Means) ✓ Gradient Descent (Linear/Logistic Regression, Neural Networks) ✓ PCA, SVM

main.py

OUTPUT

▶Click "Run Code" to execute…