Module

Machine Learning Fundamentals

Progress85%

17 / 20 pages

Lesson 1: What is Machine Learning?

Lesson 2: Linear Regression from Scratch

Lesson 3: Visualizing the Loss Landscape

Lesson 4: Logistic Regression (Classification)

Lesson 5: K-Nearest Neighbors (Distance)

Lesson 6: Evaluation Metrics (From Scratch)

Lesson 7: Unsupervised Learning & K-Means

Lesson 8: Dimensionality Reduction with PCA

Lesson 9: Decision Trees & Splits

Lesson 10: Regularization (L1 & L2)

Lesson 11: K-Fold Cross Validation

Lesson 12: Naive Bayes — Probabilistic Classifier

Lesson 13: Support Vector Machines (SVM)

Lesson 14: Gradient Boosting & AdaBoost

Lesson 15: DBSCAN — Density-Based Clustering

Lesson 16: Gaussian Mixture Models (GMM)

Lesson 17: Ensemble Methods — Combine Multiple Models

Back to Module Overview

Alt+←/→to navigatePage17/2085

Gradient Boosting & AdaBoost · Page 1 of 1

Boosting Philosophy

28 min Advanced

Gradient Boosting & AdaBoost

Boosting vs Bagging (Recap)

Bagging (Random Forest)

Train N trees independently (random subsets)
Predictions averaged
Reduces variance

Boosting (AdaBoost, Gradient Boosting)

Train trees sequentially (each corrects previous)
Trees focused on hard-to-predict samples
Reduces bias

AdaBoost (Adaptive Boosting)

How it works:

Train weak learner (shallow tree) on all data
Calculate error and increase weight on misclassified samples
Train next learner on reweighted data (hard samples matter more)
Repeat N times
Combine: weighted vote of all N learners

Why it works:

Early learners catch obvious patterns. Later learners focus on edge cases. Final model = consensus of experts, each specializing in different patterns.

Gradient Boosting

More general version of AdaBoost:

Train initial weak learner
Calculate residuals (errors)
Train next learner to predict residuals
Update predictions: pred = pred + learning_rate × residuals_pred
Repeat

Key difference from AdaBoost:

AdaBoost: Reweight samples
Gradient Boosting: Fit residuals

Gradient Boosting > AdaBoost in most cases!

Hyperparameters

Param	Effect	Typical Range
`n_estimators`	Number of trees	50-500
`learning_rate`	Step size	0.01-0.3 (smaller = more stable)
`max_depth`	Tree depth	3-8 (shallow trees!)
`subsample`	Row sampling	0.5-1.0
`colsample`	Feature sampling	0.5-1.0

Comparison: Boosting vs Bagging

Aspect	Bagging (RF)	Boosting (GB)
Training	Parallel	Sequential
Speed	Fast	Slower
Overfitting	Lower risk	Higher risk
Bias	Higher	Lower
Variance	Lower	Similar
Best for	Stable baseline	High accuracy needed

Popular Boosting Libraries

scikit-learn: GradientBoostingClassifier, AdaBoostClassifier
XGBoost: Faster, handles missing data (Lesson 5, Module 6)
LightGBM: Even faster, memory efficient
CatBoost: Handles categorical features automatically

main.py

OUTPUT

▶Click "Run Code" to execute…