22/22
Model Calibration & Probability Estimates · Page 1 of 1

Why Calibration Matters

Model Calibration

The Problem: Overconfident Predictions

A logistic regression model predicts:

  • "This email is 95% likely spam"
  • But in practice, 95% of emails it marks as spam are actually spam? NO — maybe only 80% are!

This model is miscalibrated (overconfident).

Well-Calibrated vs Miscalibrated

Well-Calibrated Model

  • Predicts 0.7 probability → 70% of those samples are actually positive
  • Predicts 0.5 probability → 50% of those samples are actually positive
  • Reliability diagram: Points lie on diagonal

Miscalibrated Model (Overconfident)

  • Predicts 0.9 probability → Only 70% are actually positive
  • Predictions too extreme (too close to 0 or 1)

Reliability Diagram

Plot predicted probability vs actual frequency:

  1. Bin predictions into 10 buckets (0-10%, 10-20%, ... 90-100%)
  2. For each bucket, calculate actual positive rate
  3. Plot predicted vs actual
  4. If on diagonal (y=x) → Well-calibrated
  5. If above diagonal → Underconfident
  6. If below diagonal → Overconfident

Calibration Methods

1. Platt Scaling (Simple)

Fit a logistic regression on model outputs:

from sklearn.calibration import CalibratedClassifierCV

calibrated = CalibratedClassifierCV(model, method='sigmoid', cv=5)
calibrated.fit(X_train, y_train)
proba_calibrated = calibrated.predict_proba(X_test)

2. Isotonic Regression (Flexible)

Map any probabilities to calibrated probabilities (more flexible than Platt).

calibrated = CalibratedClassifierCV(model, method='isotonic', cv=5)

3. Temperature Scaling (Neural Networks)

Scale confidence by learning a temperature parameter.

proba_scaled = softmax(logits / temperature)
# temperature < 1: More confident
# temperature > 1: Less confident

Why Some Models are Miscalibrated

ModelCalibration
Logistic RegressionGood (by design)
Neural NetworksPoor (overconfident)
Tree Models (RF, XGBoost)Poor (extreme probabilities)
SVMVery Poor
Naive BayesGenerally Good

Why NNs are Overconfident:

Deep learning models are trained to minimize loss, not to be calibrated. They output extreme probabilities (0.01, 0.99) because that minimizes loss faster.

Done
main.py
Loading...
OUTPUT
Click "Run Code" to execute…