Page20/20
Transfer Learning & Model Deployment Β· Page 2 of 2
Deployment & Production Considerations
Deploying Models in Production
Model Export Formats
PyTorch β ONNX β Works everywhere
TensorFlow β SavedModel β TensorFlow Serving
Keras β .h5 or SavedModel β Any framework
Quantization (Make Models Smaller)
Problem: Neural network weights are 32-bit floats (4 bytes each).
Solution: Use 8-bit integers (1 byte)!
Original: 100M parameters Γ 4 bytes = 400 MB
Quantized: 100M parameters Γ 1 byte = 100 MB (4x smaller!)
Speed: 2-4x faster on mobile!
Accuracy: Usually only 0.5-1% drop!
Batch Inference
Single prediction:
Input: One image β Model β Output
Latency: 100ms (slow)
Batch prediction:
Input: 32 images β Model β 32 outputs
Latency: 150ms (only 50% slower!)
Throughput: 32/150ms = 213 images/sec
Huge efficiency gain!
Monitoring in Production
Track:
- Accuracy on real data
- Inference latency
- Memory usage
- Error rates
Alert if:
- Accuracy drops (model drift)
- Latency spikes (resource issue)
- Error rate increases
Data Drift
Problem: Production data different from training!
Training: 2020 data
Production 2024: Different distribution!
Model's accuracy degrades over time.
Solution: Retrain periodically on new data
Deployment Platforms
| Platform | Use | Notes |
|---|---|---|
| TensorFlow Serving | High-throughput | Google-maintained |
| TorchServe | PyTorch models | Easy setup |
| ONNX Runtime | Any framework | Lightweight |
| AWS SageMaker | Managed service | Auto-scaling |
| Hugging Face | NLP models | One-click deploy |
Production Checklist
β Model accuracy validated (>95%?)
β Tested on diverse data (edge cases?)
β Latency acceptable (<100ms?)
β Memory footprint reasonable (<100MB?)
β Quantized for mobile (if needed)
β Error handling implemented
β Monitoring set up
β Retraining pipeline ready
β Documentation complete
β Ethics/bias reviewed
Common Pitfalls
| Pitfall | How to Avoid |
|---|---|
| Deploying without testing | Comprehensive test suite |
| Not handling edge cases | Anomaly detection layer |
| Forgetting to log decisions | Enable model explanability |
| Ignoring model drift | Monitor metrics continuously |
| Brittle preprocessing | Robust, versioned pipeline |
Ethics & Fairness
Before deployment, ask:
- β Does model work equally for all groups?
- β Are predictions explainable?
- β Are there unintended biases?
- β Is data used ethically?
- β Can users understand why they were rejected/approved?
main.py
Loading...
OUTPUT
βΆClick "Run Code" to executeβ¦