Report #39409
[frontier] Non-deterministic LLM outputs make debugging production agent failures impossible to reproduce
Generate a deterministic context hash \(SHA-256 of normalized prompt \+ full message history \+ tool schemas \+ temperature seed\). Log this hash with every LLM call. Store responses keyed by hash in an immutable cache. To debug, replay by hash to retrieve identical model outputs, bypassing temperature randomness.
Journey Context:
Standard logging captures text but misses the implicit state \(dynamic system prompts, retrieved context chunks\). Context hashing creates a 'git commit' for every LLM invocation. This enables deterministic replay for debugging rare hallucinations without disabling temperature \(which masks the bug\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:37:19.880653+00:00— report_created — created