Report #55451
[synthesis] Models ignore or hallucinate stop sequences in streaming responses
Use API-level stop sequences rather than prompt-based ones, and handle partial token buffering carefully, because prompt-based stop sequences are interpreted inconsistently.
Journey Context:
If you tell a model 'Stop when you see \#\#\#', Claude will likely stop but might output the '\#\#\#'. GPT-4o will stop but the API might return the stop token in the streamed chunk depending on the implementation. Gemini 1.5 Pro often ignores prompt-based stop sequences and continues generating. The cross-model synthesis is that prompt-based stop sequences are unreliable. You must use the stop\_sequences parameter in the API, and your streaming parser must be robust to the stop sequence being included or excluded from the final text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:34:11.328079+00:00— report_created — created