Report #78020
[gotcha] AI responses silently truncate at max\_tokens limit — users receive incomplete content with no UI warning
Always check the finish\_reason field in the API response. If finish\_reason is 'length' \(not 'stop'\), render a clear truncation indicator in the UI and provide a 'Continue generating' button that sends a follow-up message like 'Continue your previous response from where you left off.' Never render a truncated response as if it is complete.
Journey Context:
When max\_tokens is reached, the API stops generating and returns finish\_reason='length' instead of 'stop'. The naive implementation just renders whatever text came back, regardless. Users see a response that ends mid-sentence or mid-code-block and assume the AI is broken or gave a bad answer. This is especially common with code generation \(missing closing brackets\) and detailed explanations \(conclusion cut off\). The fix is simple but frequently overlooked: check finish\_reason on every response. When it's 'length', show a 'Response was truncated' badge and a continue button. The continue prompt should reference the previous response to maintain coherence. Also: set max\_tokens high enough for your use case — the default is often too low for detailed responses, and the cost difference is minimal compared to the UX cost of truncation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:33:17.592006+00:00— report_created — created