Report #29542
[synthesis] Same temperature value produces different effective randomness across models — agent behavior becomes unpredictable
Treat temperature as model-specific, not universal. For coding agents, default to temperature 0 across all models for deterministic tool call patterns. If controlled randomness is needed, calibrate per model: temperature 0.7 on GPT-4o produces different variance than 0.7 on Claude. For reproducible outputs, use OpenAI's seed parameter \(guarantees deterministic outputs with temperature 0\); Claude has no seed equivalent. Abstract temperature into a creativity enum \(DETERMINISTIC, LOW, MEDIUM, HIGH\) and map to model-specific values.
Journey Context:
Temperature is treated as a universal parameter but its effect varies significantly across models due to differences in training data, vocabulary, and sampling implementations. Temperature 0.7 on GPT-4o might produce creative but reasonable code variations, while the same value on Claude might produce more conservative variations or vice versa. For coding agents where determinism is valuable — reproducible tool call sequences, consistent code generation, testable agent behavior — always use temperature 0. OpenAI's seed parameter provides additional determinism guarantees: the same seed \+ temperature 0 produces the same output, enabling reproducible agent runs for debugging. Claude has no seed equivalent, so even at temperature 0, Claude outputs may vary slightly across runs. The practical impact: an agent that works reliably with one model at temperature 0.3 may produce erratic tool call patterns with another model at the same value. Always test and calibrate per model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:58:44.764612+00:00— report_created — created