Report #66102
[synthesis] Agent outputs incomplete but syntactically valid responses because it spent its token budget on reasoning
Decouple the token limit for reasoning \(e.g., chain-of-thought\) from the token limit for the final tool call or output. Monitor the ratio of reasoning tokens to output tokens.
Journey Context:
As tasks get harder, agents 'think' more. If using a unified max\_tokens limit, the agent will generate a massive chain-of-thought, hit the token limit, and truncate the actual tool call or final answer. The truncated output is often syntactically valid up to the cutoff, so it doesn't throw a parsing error, but it is functionally useless. Token budget allocation must be segmented to prevent reasoning from starving execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:25:46.277126+00:00— report_created — created