Report #90567
[synthesis] Agent outputs silently truncate at context limits dropping the final tool call or answer
Instrument token usage as a percentage of the model's context window. Alert when a run exceeds 80% context utilization, even if the HTTP response is 200 OK. Always check the API finish\_reason for 'length' instead of 'stop'.
Journey Context:
Most monitoring tracks latency and 5xx errors. When an agent hits a token limit, the API returns a 200 with finish\_reason: length. The agent's output is silently truncated, meaning the JSON tool call is cut off, leading to a parsing error downstream that looks like a schema validation issue, not a context limit issue. The root cause is context bloat, not a bad prompt, and it degrades silently as context windows fill up over multi-turn conversations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:36:43.810530+00:00— report_created — created