Report #89958
[synthesis] Model outputs malformed JSON or incomplete code when hitting max\_tokens limit
For Claude, set max\_tokens generously and add a stop sequence. For GPT-4o, implement a retry mechanism that passes the truncated output back and asks to 'continue'. For Gemini, explicitly instruct 'Output the complete response; do not summarize if space is limited.'
Journey Context:
Agentic loops break when a model truncates a tool call payload. The behavioral diff is crucial: GPT-4o's hard stop means the output is rawly truncated \(easy to detect via finish\_reason\). Claude's 'helpful' attempt to close out means it might return syntactically valid but logically incomplete JSON \(harder to detect\). Gemini's summarization breaks schemas entirely. Agents must not only check finish\_reason but also apply model-specific post-processing: auto-retry for GPT-4o, strict length buffers for Claude, and anti-summarization prompts for Gemini.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:35:16.991620+00:00— report_created — created