91% of ML models degrade in production (NeurIPS study). DriftGuard catches it before your revenue does.
A production-grade monitoring system that watches deployed ML models and raises alerts when data drift, concept drift, or performance degradation occur. Includes automatic root cause analysis, drift severity scoring, and retraining triggers.
┌─────────────────────────────────────────────────────────┐
│ DriftGuard System │
├─────────────┬─────────────┬─────────────┬───────────────┤
│ Data Drift │Concept Drift│ Performance │ Root Cause │
│ Detection │ Detection │ Monitoring │ Analysis │
├─────────────┼─────────────┼─────────────┼───────────────┤
│ • KS Test │ • ADWIN │ • Accuracy │ • Feature │
│ • PSI │ • Page- │ • Latency │ Ranking │
│ • MMD │ Hinkley │ • Error │ • Severity │
│ • Adversar- │ • DDM │ Rate │ Scoring │
│ ial │ │ • Custom │ • Co-drift │
│ │ │ Metrics │ Groups │
├─────────────┴─────────────┴─────────────┴───────────────┤
│ Retraining Trigger Engine │
│ Multi-signal composite scoring + cooldown + urgency │
├─────────────────────────────────────────────────────────┤
│ Drift Scenario Simulator │
│ Sudden | Gradual | Recurring | Incremental │
└─────────────────────────────────────────────────────────┘
cd driftguard
pip install -r requirements.txtstreamlit run app.pycd driftguard
python -m pytest tests/ -vDetects when input feature distributions shift from training data:
| Detector | Method | Best For |
|---|---|---|
| KS Test | Kolmogorov-Smirnov two-sample test | Per-feature distribution comparison |
| PSI | Population Stability Index | Binned distribution shift magnitude |
| MMD | Maximum Mean Discrepancy (RBF kernel) | Multivariate distribution comparison |
| Adversarial | Train classifier to distinguish datasets | Complex non-linear drift patterns |
from src.detectors.data_drift import KSTestDetector, PSIDetector
ks = KSTestDetector(significance=0.05)
result = ks.detect(reference_data, current_data, feature_names)
if result.is_drift:
print(f"Drift detected! Score: {result.score:.4f}")
print(f"Drifted features: {result.details['drifted_features']}")Detects when the relationship between features and target changes:
| Detector | Method | Characteristics |
|---|---|---|
| ADWIN | Adaptive Windowing | Variable window, Hoeffding bound |
| Page-Hinkley | Cumulative sum test | Detects mean shifts in streams |
| DDM | Drift Detection Method | Error rate + std dev monitoring |
from src.detectors.concept_drift import ADWINDetector
adwin = ADWINDetector(delta=0.002)
for error in prediction_errors:
result = adwin.add_element(error)
if result.is_drift:
print("Concept drift detected!")Tracks model performance metrics over sliding windows:
from src.detectors.performance_drift import PerformanceDriftDetector
monitor = PerformanceDriftDetector(window_size=200)
monitor.set_baselines({"accuracy": 0.95, "latency_ms": 10})
snapshot = monitor.record(accuracy=0.89, latency_ms=15)
if snapshot.is_degraded:
print("Performance degradation detected!")Automatically identifies which features drove the drift:
from src.analysis.root_cause import RootCauseAnalyzer
analyzer = RootCauseAnalyzer()
report = analyzer.analyze(reference_data, current_data, feature_names)
print(report.summary())
# Shows: severity ranking, co-drifted groups, actionable recommendationsMulti-signal decision engine with cooldown and urgency scoring:
from src.monitors.retraining_trigger import RetrainingTrigger
trigger = RetrainingTrigger(retrain_threshold=0.40)
decision = trigger.evaluate(
data_drift_result=ks_result,
concept_drift_results=adwin_results,
performance_snapshot=perf_snapshot,
)
if decision.should_retrain:
print(f"Urgency: {decision.urgency.value}")
print(f"Reasons: {decision.reasons}")Generate realistic drift scenarios for testing:
from src.simulator.drift_simulator import DriftSimulator, get_preset_scenarios
simulator = DriftSimulator()
scenarios = get_preset_scenarios()
# Simulate a sudden drift (new product category)
stream = simulator.simulate(scenarios["sudden_category"])
# Train a model and monitor it
model = simulator.create_model(stream.reference_X, stream.reference_y)Built-in scenarios:
- Sudden: New Product Category — Abrupt distribution shift
- Gradual: Seasonal Change — Slow linear shift over time
- Recurring: Weekday/Weekend — Periodic oscillating patterns
- Incremental: Slow Creep — Constant-rate accumulating drift
- Sudden: Major Pipeline Break — Severe multi-feature drift
driftguard/
├── app.py # Streamlit dashboard
├── requirements.txt
├── README.md
├── src/
│ ├── __init__.py
│ ├── detectors/
│ │ ├── __init__.py
│ │ ├── data_drift.py # KS, PSI, MMD, Adversarial
│ │ ├── concept_drift.py # ADWIN, Page-Hinkley, DDM
│ │ └── performance_drift.py # Sliding window metrics
│ ├── analysis/
│ │ ├── __init__.py
│ │ └── root_cause.py # Feature ranking & recommendations
│ ├── monitors/
│ │ ├── __init__.py
│ │ └── retraining_trigger.py # Auto-retrain decision engine
│ └── simulator/
│ ├── __init__.py
│ └── drift_simulator.py # Scenario generation
├── tests/
│ ├── __init__.py
│ ├── test_data_drift.py
│ ├── test_concept_drift.py
│ └── test_pipeline.py # Integration + E2E tests
├── models/ # Saved model artifacts
└── data/ # Reference datasets
Composite severity score (0–1) per feature:
- KS statistic (30%): Non-parametric distribution distance
- PSI, normalized (30%): Binned distribution divergence
- Mean shift (25%): Standardized location change
- Std shift (15%): Scale change magnitude
Weighted composite of three signals:
composite = 0.35 × data_drift + 0.35 × concept_drift + 0.30 × performance
Urgency levels:
| Composite Score | Urgency | Action |
|---|---|---|
| < 0.28 | None | Continue monitoring |
| 0.28 – 0.40 | Low | Schedule review |
| 0.40 – 0.55 | Medium | Plan retraining |
| 0.55 – 0.70 | High | Retrain soon |
| > 0.70 | Critical | Immediate retraining |
This project demonstrates the production ML mindset — the gap between training a model and maintaining one in production:
- Data pipelines break silently — DriftGuard catches distribution shifts before they impact users
- Models degrade gradually — Concept drift detection finds when the world changes under your model
- Root cause analysis saves hours — Instead of "something is wrong," you get "feature X shifted by 2.3σ because of Y"
- Automated retraining — No more manual checks; the system decides when to retrain
Companies like Netflix, Uber, and Stripe maintain dedicated teams for exactly this capability.