Page18/22
Model Monitoring & Drift Detection · Page 1 of 1
The Concept Drift Problem
Model Monitoring & Drift Detection
The Problem: Concept Drift
A model trained on 2022 data performs poorly on 2024 data because:
- Data Distribution Shift: Feature distributions change (e.g., customer income increases over time)
- Concept Drift: The relationship between features and target changes (e.g., email spam patterns evolve)
- Feature Drift: New features become important, old ones become useless
Real-World Example: Credit Risk
- 2019 Model: "People with income >$50K are low risk"
- 2024 Reality: Inflation made $50K less meaningful; model accuracy drops from 92% to 78%
Monitoring Metrics
1. Prediction Distribution
Compare training vs production predictions:
- Training: 60% Class 0, 40% Class 1
- Production (Month 6): 20% Class 0, 80% Class 1 → Alert! Distribution changed drastically
2. Feature Distributions
Monitor each feature's distribution:
- If mean age shifts from 35 to 45, alert
- If credit score variance increases 10x, alert
3. Model Accuracy (Ground Truth Required)
Keep collecting true labels in production:
- If accuracy drops >5% for a week, retrain
4. Prediction Latency
- If API response time increases, data quality issues might exist
Implementation
from sklearn.metrics import accuracy_score
import json
# In production
def monitor_model(y_true, y_pred):
acc = accuracy_score(y_true, y_pred)
if acc < threshold:
alert("Model accuracy below threshold!")
log_event("retrain_needed")
# Log metrics
with open('metrics.json', 'a') as f:
json.dump({'date': now, 'accuracy': acc}, f)
Retraining Strategies
Strategy 1: Fixed Schedule
Retrain every week, every month
- Pros: Simple, predictable
- Cons: Might retrain too often or too late
Strategy 2: Trigger-Based
Retrain when accuracy drops below threshold
- Pros: Responds to actual drift
- Cons: Requires ground truth labels
Strategy 3: Hybrid
Schedule check every week. If accuracy drops >5%, immediately retrain
main.py
Loading...
OUTPUT
▶Click "Run Code" to execute…