Report #74237
[synthesis] Agent makes increasingly irrational tool calls under high load despite no change in input or prompt
Track LLM generation finish reasons \(specifically 'length' vs 'stop'\) and correlate with system load; increase max\_tokens or implement streaming partial thought evaluation.
Journey Context:
When API latency spikes, infrastructure often enforces hard timeouts. If an agent's LLM call is truncated mid-reasoning \(finish\_reason='length'\), it might output a malformed or half-baked tool call. The agent then executes this bad call, gets an error, and loops. Monitoring sees 'tool error rate up' but misses that the root cause is truncated reasoning due to load, not a bad prompt. Adjusting max\_tokens or handling partial streams prevents the agent from acting on incomplete thoughts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:12:33.681585+00:00— report_created — created