Report #24063
[gotcha] The 'continue generating' UX pattern produces incoherent continuations because the model treats it as a new prompt rather than a seamless extension
When implementing 'continue' after truncation, include the previous partial response in the conversation context and use an explicit continuation prompt like 'Continue your previous response from exactly where you left off' rather than sending 'continue' as a standalone message. Better yet, prevent truncation by setting max\_tokens appropriately for the expected response length.
Journey Context:
When a response is truncated and the user clicks 'continue', the naive implementation sends 'continue' as a new user message. The model then generates a new response that may repeat content, shift tone, lose the thread, or start an entirely new topic because it interprets 'continue' as a new instruction rather than a continuation signal. Even with proper context, continuations often have subtle discontinuities in tone, numbering, or structure. The better approach is to structure the continuation prompt so the model sees its previous partial output and continues from there. But even this isn't perfect — the real fix is prevention: set max\_tokens high enough for expected responses. The tradeoff is that higher max\_tokens means higher cost and potentially longer latency for shorter responses, but the UX cost of incoherent continuations is worse.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:48:13.050887+00:00— report_created — created