Module

Advanced ML & Model Interpretability

Progress50%

11 / 22 pages

Lesson 1: Advanced Evaluation Metrics

Lesson 2: Stratified K-Fold Cross-Validation

Lesson 3: SHAP (SHapley Additive exPlanations)

Lesson 4: LIME (Local Interpretable Model-agnostic Explanations)

Lesson 5: Data Distributions & Normality

Lesson 6: Feature Scaling & Normalization

Lesson 7: Handling Class Imbalance

Lesson 8: Hyperparameter Tuning (Grid & Random Search)

Lesson 9: Feature Engineering — Create Better Features

Lesson 10: XGBoost — The Best Algorithm

Lesson 11: Advanced Ensemble Methods

Lesson 12: Introduction to Neural Networks

Lesson 13: Model Deployment & Production

Lesson 14: Model Monitoring & Drift Detection

Lesson 15: ML Ethics & Fairness

Lesson 16: Time Series Basics

Lesson 17: Causal Inference & A/B Testing

Lesson 18: Model Calibration & Probability Estimates

Back to Module Overview

Alt+←/→to navigatePage11/2250

Hyperparameter Tuning (Grid & Random Search) · Page 1 of 1

What are Hyperparameters?

30 min Advanced

Hyperparameter Tuning

Hyperparameters vs Parameters

Parameters

Learned during training.

Linear Regression weights (w, b)
Neural Network weights
Decision Tree split thresholds

Hyperparameters

Set before training. You choose them.

Learning rate
Number of trees in Random Forest
Max depth of decision tree
K in K-Nearest Neighbors
Regularization strength (λ)

Manual vs Automated Tuning

Manual (Bad)

model = RandomForest(n_estimators=10)  # Guess 10
# Train, test...maybe it's not optimal

model = RandomForest(n_estimators=50)  # Try 50
# Train, test...still not optimal

Takes forever, often suboptimal.

Automated: GridSearchCV (Exhaustive)

Try all combinations:

param_grid = {
    'n_estimators': [10, 50, 100],
    'max_depth': [5, 10, 20],
    'min_samples_split': [2, 5, 10]
}
# Total: 3 * 3 * 3 = 27 combinations tested

Pros: Guaranteed to find the best combo Cons: Slow for large grids (10 hyperparameters = 10,000+ combos)

Automated: RandomizedSearchCV (Sampling)

Try random combinations:

param_dist = {
    'n_estimators': range(10, 200),  # Sample 20 random values
    'max_depth': range(5, 50),
}
search = RandomizedSearchCV(model, param_dist, n_iter=20, cv=5)

Pros: Faster than Grid Search Cons: Might miss the optimal combo

K-Fold During Tuning

GridSearchCV automatically uses K-Fold CV to evaluate each combo:

Split data into 5 folds
For each hyperparameter combo:
- Train on folds 1-4, test on fold 5
- Train on folds 1,2,3,5, test on fold 4
- ... (repeat 5 times)
- Average the 5 scores
Pick the combo with best average score
Retrain on entire training set
Evaluate on test set (never used during tuning!)

main.py

OUTPUT

▶Click "Run Code" to execute…