Report #98169
[synthesis] AI rollbacks are harder than software rollbacks because bad behavior returns healthy status codes
Version every model, prompt, and tool schema in a registry; deploy with canary traffic; monitor semantic quality metrics, prediction distributions, and business KPIs; trigger automated rollback on drift thresholds, not only on errors.
Journey Context:
Code rollback reverts a known binary to a previous binary. AI rollback is harder because the system can degrade in output quality while remaining fully available: every request returns 200, latency is fine, but answers are wrong or harmful. Detecting this requires baselines for prediction distributions, eval scores, and longitudinal user satisfaction. OpenAI's 2025 sycophancy rollback illustrated the cost of skipping progressive rollout: a prompt change shipped to ~180 million users before negative signals became visible. MLOps best practice — model registry, shadow deployment, canary analysis, and automatic rollback on drift — only works if prompts and model versions are first-class artifacts with versioning and release gates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:20:42.726281+00:00— report_created — created