Report #39661
[synthesis] Reliance on stop sequences for streaming or truncation produces overflowing outputs depending on the model
Do not rely on exact stop sequence truncation for critical logic. Implement application-level truncation or use native structured output modes instead of string-matching stop sequences.
Journey Context:
In streaming agentic loops, developers use stop sequences to cut off generation \(e.g., stopping before a model thinks out loud\). GPT-4o's tokenization often means the API returns the stop token or a partial next token before the stop actually triggers. Claude handles it cleanly. Gemini sometimes ignores whitespace stop sequences. Relying on the API to perfectly slice the string leads to parsing errors; application-level string manipulation is the only robust cross-model fix.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:02:42.152630+00:00— report_created — created