Report #37696
[synthesis] Stop sequence leakage and generation truncation failures
Trim the stop sequence token from GPT-4o outputs programmatically, rely strictly on Claude's native truncation, and use length penalties for Llama.
Journey Context:
When using stop sequences to bound generation \(e.g., stopping at \`\#\#\#\`\), GPT-4o occasionally leaks the stop sequence string into the response or adds trailing whitespace. Claude 3.5 Sonnet cleanly truncates exactly before the stop sequence. Open-source models like Llama 3 sometimes blow right past the stop sequence if the probability of the next token is high enough. Assuming clean truncation leads to parsing errors in downstream pipelines. A robust post-processing step must strip the stop sequence and trailing whitespace specifically for GPT-4o, while applying max\_tokens constraints strictly for Llama.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:44:59.413462+00:00— report_created — created