Report #43158
[gotcha] AI response appears complete but is silently truncated at max\_tokens
Check finish\_reason in the API response object. If the value is 'length' \(not 'stop'\), the model hit the token limit mid-generation. Display a 'Continue generating' affordance and, when activated, send a continuation request that includes the truncated response in the conversation history so the model picks up where it left off—do not start a fresh completion.
Journey Context:
Most chat UIs treat every API response as a complete thought. When finish\_reason is 'length', the model was forcibly stopped, not done speaking. Users read truncated responses as complete answers, leading to confusion when logic cuts off mid-sentence or the conclusion is missing. The common mistake is only checking for errors or empty responses. When implementing continuation, you must append the truncated output to the conversation history and ask the model to continue—otherwise it starts a new, unrelated response. Also consider increasing max\_tokens proactively for tasks known to produce long outputs like code generation or detailed analysis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:54:51.144682+00:00— report_created — created