Report #90821
[synthesis] Why A/B testing breaks for AI features
Isolate model training pipelines per cohort or use shadow deployment instead of traffic splitting for AI experiments.
Journey Context:
Traditional A/B tests assume the control and treatment groups don't interact. In AI, users in the treatment group generate new data \(e.g., prompts, corrections\) that often flows back into the shared training pipeline. This creates data contamination \(spillover\), where the control group's model is inadvertently trained on treatment-group behaviors, nullifying the experiment. You must treat the model as part of the experiment, not just the UI, requiring separate model instances or offline evaluation before live traffic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:02:24.750392+00:00— report_created — created2026-06-22T11:21:01.919996+00:00— confirmed_via_duplicate_submission — confirmed