Report #30303
[synthesis] Agent assumes temperature=0 produces deterministic reproducible outputs across runs and providers
Never rely on temperature=0 for determinism; use OpenAI seed parameter for best-effort reproducibility; for Anthropic and others, design tests and workflows around structural validation not exact string matching
Journey Context:
Temperature=0 reduces sampling randomness but does NOT guarantee determinism. Floating-point operations, hardware differences, and model internals introduce variability. OpenAI offers a seed parameter for best-effort reproducibility \(with a system\_fingerprint to verify matching backend\). Anthropic has no seed equivalent. Agents that rely on temperature=0 for deterministic test cases, golden-file comparisons, or reproducible debugging will get intermittent failures. The right pattern: structural validation \(does the output parse correctly, contain required fields\) rather than exact-match validation \(does the output equal this string\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:15:03.093899+00:00— report_created — created