Model Monitoring
The practice of continuously tracking ML model performance, data quality, and system health in production to detect degradation, drift, and anomalies before they significantly impact users.
Model monitoring goes beyond infrastructure monitoring to track the quality of model outputs. While infrastructure metrics tell you if the model is serving responses, model monitoring tells you if those responses are any good. Key metrics include prediction accuracy (compared against delayed ground truth), output distribution (are predictions shifting?), feature drift (are inputs changing?), and business impact (are model-driven features achieving their goals?).
A comprehensive monitoring setup includes real-time dashboards for prediction distributions, automated alerts when metrics cross thresholds, data drift detection comparing current inputs to training distributions, and feedback loop tracking that connects model predictions to business outcomes.
For AI products in production, model monitoring is essential because model degradation is often silent. A recommendation model might gradually serve less relevant suggestions as user preferences evolve, with engagement declining slowly enough that no single alert fires. Continuous monitoring with trend analysis catches this gradual degradation, triggering retraining or investigation before the business impact becomes significant.
Related Terms
Cosine Similarity
A measure of similarity between two vectors based on the cosine of the angle between them, ranging from -1 (opposite) to 1 (identical), commonly used to compare embeddings.
Dimensionality Reduction
Techniques that reduce the number of dimensions in high-dimensional data while preserving meaningful structure, used for visualization, compression, and noise removal.
Batch Inference
Processing multiple ML predictions as a group at scheduled intervals rather than one-at-a-time on demand, optimizing for throughput and cost over latency.
Real-Time Inference
Generating ML predictions on-demand as requests arrive, typically with latency requirements under 200ms for user-facing features.
Data Pipeline
An automated sequence of data processing steps that moves data from source systems through transformations to destination systems, enabling reliable and repeatable data flows across an organization.
ETL (Extract, Transform, Load)
A data integration pattern that extracts data from source systems, transforms it into a structured format suitable for analysis, and loads it into a target data warehouse or database.