Report #29328
[synthesis] AI feature rollbacks fail because model state, user adaptation, and downstream data cannot be reverted with code
Maintain hot-swapable previous model versions in serving infrastructure alongside current. Implement canary traffic shifting rather than binary deploys. Never fine-tune production models in-place—always train candidate models on frozen data snapshots. Log the exact model version, prompt template, and sampling config with every inference for point-in-time reconstruction.
Journey Context:
Code rollback is a git revert: atomic, complete, and the system returns to a known state. AI rollback is entangled: \(1\) the model may have been fine-tuned on data generated during the faulty period, so reverting code doesn't revert learned weights; \(2\) users adapted their prompts and workflows to the new model's behavior, so the old model now performs worse against shifted inputs; \(3\) downstream consumers may have re-indexed or cached model outputs. The common mistake is treating model deployment like code deployment. The right call is to never mutate production model state in-place and to always maintain N-1 serving readiness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:36:59.869177+00:00— report_created — created