Continuous evaluation—the process of ensuring a production machine learning model is still performing well on new data—is an essential part in any ML workflow.
Performing continuous evaluation can help you catch model drift, a phenomenon that occurs when the data used to train your model no longer reflects the current environment.
We take the pain out of model and data monitoring so that you spend less time firefighting, and more time building models.
WhyLabs enables them to operate with certainty by providing model monitoring, preventing costly model failures, and facilitating cross-functional collaboration.
Comet Model Production Monitoring (MPM) - focuses on models post production. The original product was more around how multiple offline experiments are modeled during training, while MPM is focused on these models once they hit production for the first time
Evidently helps analyze machine learning models during development, validation, or production monitoring. The tool generates interactive reports from pandas DataFrame. Currently 6 reports are available.
Data Drift - Detects changes in feature distribution.
Numerical Target Drift - Detects changes in numerical target (see example below) and feature behavior.
Categorical Target Drift - Detects changes in categorical target and feature behavior (see example below)
Regression Model Performance - Analyzes the performance of a regression model and model errors (see example below).
Classification Model Performance - Analyzes the performance and errors of a classification model. Works both for binary and multi-class models
Probabilistic Classification Model Performance - Analyzes the performance of a probabilistic classification model, quality of model calibration, and model errors.
https://torchdrift.org - TorchDrift is a data and concept drift library for PyTorch. It lets you monitor your PyTorch models to see if they operate within spec
John Dickerson - Chief scientists - co-cofounder - LinkedIn
Boxkite
Boxkite - capture feature and inference distributions used in model training, then compares them against realtime production distributions via Prometheus and Grafana.