Report #41031
[synthesis] Why A/B tests show AI features winning in week 1 but losing by week 4
Model treatment effect as a time-decaying function rather than a constant; use Bayesian time-varying coefficient models; set experiment duration based on trust half-life \(typically 2–3 negative AI interactions\) rather than statistical power alone; instrument for early negative experiences as a mediator variable.
Journey Context:
Standard A/B testing assumes stable treatment effects \(SUTVA\). AI features violate this because they have a trust half-life: users who experience an early error develop avoidance behavior that compounds, causing the treatment effect to decay non-linearly. Worse, AI features often become opt-in after initial exposure, so surviving treated users are self-selected for high trust—creating survivorship bias that makes the feature look more effective over time even as it loses the broader user base. The common mistake is running a standard 2-week A/B test and reading the final number. The right call is to model the treatment effect trajectory and flag decay patterns, because a decaying treatment effect predicts future churn even when the current number looks positive.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:20:22.507678+00:00— report_created — created