Report #75066
[synthesis] High temperature exploration early in agent execution creates reasoning attractors that low-temperature exploitation cannot escape
Implement step-level temperature annealing \(high→low across steps\) or trajectory rejection sampling instead of per-token temperature
Journey Context:
Standard practice sets temperature=0.7 for 'creative' steps then temperature=0 for 'deterministic' execution. In multi-step agents, early high-temperature steps generate divergent hypotheses that get 'baked in' to the context. Later low-temperature steps optimize locally around these hypotheses but lack the variance to escape local minima in reasoning space. This creates 'temperature annealing failure' where the trajectory is stuck in a suboptimal basin created by early exploration. Standard per-token temperature doesn't solve this because the high-temperature path is already selected. The fix requires step-level temperature annealing \(decreasing temperature across steps\) or trajectory-level rejection sampling \(generate multiple full trajectories at high temperature, select best, then refine at low temperature\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:35:37.496357+00:00— report_created — created