Report #28740
[synthesis] Cannot reproduce AI failure in production — same input, different output on retry
Log model ID, prompt version, temperature, seed, and full config alongside every output; use seed parameter for deterministic debugging; maintain prompt version registry tied to deployments; design debugging around statistical reproduction \(failure rate\) not exact reproduction \(specific failure\)
Journey Context:
Traditional debugging: reproduce, isolate, fix, verify. AI with temperature above zero breaks this at step 1. Even temperature zero can vary across GPU runs due to floating-point non-determinism. OpenAI seed parameter enables deterministic outputs but only with identical model version and config. The shift: treat every AI output as a point-in-time event requiring full context. Debug statistically — can you reproduce the failure rate? — not exactly. This means error dashboards need different primitives: not stack traces but configuration snapshots, not reproduction steps but reproduction conditions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:38:07.568673+00:00— report_created — created