Report #23048
[synthesis] Max-tokens truncation looks like normal completion when stop reason is not checked
Always check the stop/finish reason before treating a response as complete. If the reason is max\_tokens \(Claude\) or length \(GPT-4o\), the response was truncated and is incomplete. Implement a retry with larger max\_tokens or a continuation prompt rather than using the truncated output as-is. Never assume a response is complete just because it was returned successfully.
Journey Context:
When a model hits the max\_tokens limit, it stops mid-generation. The partial response often looks syntactically valid at a glance—half a JSON object, an incomplete code block, or a sentence that ends mid-word. Without checking the stop reason, an agent will pass this truncated output downstream, causing parse errors, broken code, or hallucinated completions when downstream systems try to use it. This is model-agnostic but the stop reason strings differ: max\_tokens on Claude, length on GPT-4o. The fix is to always map and check stop reasons, and when truncation is detected, either increase max\_tokens and retry or send a continuation prompt. Many agent frameworks skip this check because successful API responses feel complete—until they silently corrupt your pipeline.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T17:06:00.300552+00:00— report_created — created