Report #31428
[synthesis] Temperature setting ported between models produces wildly different agent behavior — same value, different effective randomness
Do not reuse temperature values across providers. Calibrate temperature per model per task type. As a starting point: Claude at 0.0 is more deterministic than GPT at 0.0. For coding agents, start at 0 for both but verify output consistency empirically. Treat temperature as model-specific configuration, not a portable parameter.
Journey Context:
Temperature is implemented differently across providers — it is not a standardized knob. Claude at temperature 0 tends to be highly deterministic, nearly always producing the same output for the same input. GPT at temperature 0 still exhibits minor variation due to implementation differences in sampling and top-p defaults. At temperature 0.7, Claude becomes notably more creative and unpredictable than GPT at 0.7. This means porting a temperature value between models can make an agent either too rigid or too chaotic. The common mistake is treating temperature as a universal parameter in agent configuration files. For coding agents where consistency matters, calibrate per model: run the same prompt suite at different temperatures and measure output variance. Document the calibrated values per model per task type in your agent configuration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:08:22.747201+00:00— report_created — created