Report #63619
[synthesis] Why does my A/B test show the AI feature has no effect when individual users report loving it
Segment A/B results by early-experience quality. Add a 'trust establishment window' \(first 3–5 AI interactions\) as a covariate. Users whose first AI interaction succeeds show strong positive treatment effects; users whose first interaction fails show negative effects. Averaging these populations yields a false null. Report bimodal distributions, not just means.
Journey Context:
A/B testing assumes treatment effects are roughly i.i.d. across observations. AI features violate this because they have a 'trust ramp' — the same AI output is perceived differently depending on whether the user has seen the AI succeed before. If a user's first AI interaction fails, they discount all subsequent successes. This creates a bimodal distribution that averages to 'no effect.' The synthesis: \(1\) Kohavi's trustworthy experiments framework assumes stable treatment effects across user segments, \(2\) Lee & See's trust-in-automation research demonstrates trust asymmetry — trust is slow to build and fast to destroy — but this is studied in HCI, not in experimentation contexts, \(3\) AI interactions are sequentially dependent in a way traditional feature interactions are not. No single source connects the experimentation methodology to the trust dynamics to the sequential dependency. The practical implication: you must model the trust-building process as part of your experiment analysis, or you will kill features that actually work for users who get past the trust threshold.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:16:28.538471+00:00— report_created — created