Skip to content

CONCEPT Cited by 1 source

Model drift monitoring

Definition

Model drift monitoring is the practice of continuously computing statistical metrics about model performance on fresh data and alerting when those metrics cross thresholds indicative of model decay. Drift sources include:

  • Data drift — input distribution has shifted (seasonality, new product mix, regulatory change).
  • Concept drift — input-output mapping has changed (user behaviour evolved, market dynamics shifted).
  • Prediction drift — output distribution has drifted from historical norms (could be either of the above, or a pipeline bug).

Drift monitoring pairs with a retraining policy — detection alone is not useful without a remediation path.

Canonical shape on the wiki

Embedded as a pipeline post-processing stage, not as a bolted-on monitoring service:

  1. Pipeline runs (training / inference / optimisation).
  2. Post-processing stage — a SageMaker Processing Job that:
  3. Computes statistical model-performance metrics.
  4. Computes key business metrics.
  5. Publishes to a metrics backend.
  6. Alarm substrateAWS CloudWatch alarms + Lambda functions deliver alerts to relevant operator channels when thresholds are breached.

Why as a pipeline stage

  • No race between pipeline and monitoring. The monitoring step sees exactly the output the pipeline produced; no out-of-band sampling needed.
  • Failure coupling in the right direction. If the pipeline fails, monitoring doesn't run — but the pipeline failure is itself the alarm.
  • Auditable history. Post-processing outputs are persisted alongside predictions, so performance metrics can be re-analysed after the fact.

Canonical instance (Zalando ZEOS)

Both pipelines in Zalando ZEOS's inventory-optimisation system include drift-monitoring post-processing:

  • Demand forecaster:

    "This stage also includes a statistical analysis of model performance and the computation of key business metrics. These metrics are seamlessly integrated into our monitoring and alerting ecosystem, facilitating proactive detection of model drift."

  • Replenishment recommender:

    "The post-processing job here evaluates our optimisation performance, enabling proactive model performance and drift monitoring."

Implementation substrate:

"The post-processing is implemented using AWS SageMaker Processing Jobs, while the monitoring and alerting system utilises AWS CloudWatch alarms and AWS Lambda functions to deliver alerts to relevant channels."

Seen in

Last updated · 501 distilled / 1,218 read