Report #93670
[synthesis] Why do A/B tests show AI features winning at launch but losing long-term
Extend A/B test duration to a minimum of 4-6 weeks for AI features \(vs 1-2 weeks for deterministic features\). Add trust-retention and re-engagement curves as primary metrics alongside conversion. Specifically measure the delta between week-1 and week-4 engagement to detect novelty effects.
Journey Context:
Standard A/B testing assumes stable treatment effects over time. AI features exhibit three phases: a novelty phase with artificially high engagement, a trust-formation phase where users calibrate expectations, and a steady-state phase revealing true value. If you measure during phase 1, you ship features that degrade in phase 3. The synthesis of controlled-experiment methodology with human-AI interaction research reveals that the novelty effect for AI features is both stronger and longer-lasting than for traditional features because users are exploring a capability space, not just a UI. Worse, the degradation is invisible—users don't complain, they quietly reduce usage. The experiment duration that works for deterministic features systematically overestimates AI feature value.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:48:41.791544+00:00— report_created — created