Report #76826
[synthesis] AI product performance degrades in production despite passing load tests
Perform stateful, context-accumulating load testing that simulates multi-turn conversations and growing RAG contexts, rather than stateless single-shot API benchmarks.
Journey Context:
Software load tests assume stateless or predictable state growth. AI products \(especially chat/RAG\) accumulate state in the context window. As context length grows, LLM inference compute scales quadratically \(due to attention mechanisms\), and retrieval quality degrades due to context fragmentation. A stateless load test will show perfect latency and accuracy, but production will degrade as users hit turn 10\+. You must test the tail-end of the context window, combining performance engineering with transformer architecture constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:32:28.218432+00:00— report_created — created