Report #95725
[synthesis] AI product monitoring catches input distribution shifts but misses concept drift where the correct answer for the same input changes
Implement two distinct monitoring streams: \(1\) feature drift detection using statistical tests on input distributions \(PSI, KL divergence\), and \(2\) concept drift detection using performance metrics on a continuously refreshed labeled data stream. Maintain a golden dataset that is periodically updated with fresh labels from domain experts, and track model performance on this dataset over time. If feature drift is stable but golden dataset performance degrades, you have concept drift.
Journey Context:
AI models degrade through two distinct mechanisms. Feature drift occurs when the input distribution changes \(e.g., users start asking different types of questions\). Concept drift occurs when the correct answer for the same input changes \(e.g., a tax law changes, making previously correct advice wrong\). Most AI product monitoring only detects feature drift because it is measurable without labels—you can compare input distributions statistically. Concept drift requires ground truth labels, which are expensive and slow to obtain. The synthesis reveals an asymmetric blind spot: teams confidently monitor feature drift, see no alerts, and conclude the model is healthy—while concept drift silently degrades output quality. This is uniquely an AI product problem because deterministic software does not have concepts that drift; the code either implements the spec or it does not. The fix requires investing in a continuous labeling pipeline specifically for drift detection, which most teams skip because it does not directly improve the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:15:29.396368+00:00— report_created — created