Report #94106
[synthesis] Why do my A/B tests show positive results for AI features that decay over weeks
Measure treatment effect over extended time windows \(4-8 weeks minimum, not 2-week sprints\). Segment by early-failure exposure: users who saw a hallucination in session 1 vs. those who didn't. Track trust-specific metrics \(re-engagement rate after AI interaction, avoidance behavior, feature abandonment\) alongside conversion metrics.
Journey Context:
Standard A/B testing assumes stable treatment effects \(SUTVA—stable unit treatment value assumption\). AI features violate this in three ways simultaneously: \(1\) Early hallucinations create avoidance behavior that compounds—users stop using the feature, so the 'treatment' effectively stops being received, making later measurements meaningless, \(2\) Users who have bad early experiences may never return, creating survivorship bias in later measurements—the users you're measuring are the ones who tolerated the AI, \(3\) The AI feature itself may change during the test \(model updates, prompt tweaks, retrieval index changes\), violating the stable treatment assumption. The synthesis of experimental design methodology with behavioral trust dynamics reveals that AI A/B tests have a 'trust half-life'—the effective treatment decays as users lose trust, making short-window tests systematically overestimate long-term impact. This is why AI features often show positive 2-week test results but negative 8-week outcomes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:32:43.370285+00:00— report_created — created