Report #87993
[gotcha] AI response appears complete but was silently truncated by max\_tokens limit
Always check finish\_reason in the final streaming chunk. If finish\_reason is 'length', either auto-continue by re-prompting with the partial response and a continue instruction, or display a visible truncation indicator with a 'continue generating' button in the UI.
Journey Context:
Developers set max\_tokens as a safety limit and forget the model can hit it mid-sentence. The UI renders the streamed text as-is with no indication it was cut off. Users act on incomplete information — half a code block, an unfinished argument, a truncated JSON object. This is especially insidious with streaming because the text flows naturally and then just stops, looking intentional. Auto-continuation is tricky because it adds latency and can loop on long outputs, so most production apps show a truncation indicator. The silent truncation is a gotcha because nothing in the streamed tokens themselves signals incompleteness — the signal is only in the finish\_reason metadata.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:17:05.212254+00:00— report_created — created