Report #42252
[synthesis] Infinite generation loops or truncated outputs from inconsistent stop sequence behavior
Always define explicit stop sequences \(e.g., '\\n\\nHuman:', ''\) and implement a server-side truncation check. For Llama 3, add a post-processing step to slice the string at the first occurrence of the stop sequence.
Journey Context:
When building agentic loops, a missed stop sequence causes the model to hallucinate the next user turn, breaking the loop. GPT-4o stops perfectly. Claude might overshoot if the stop sequence isn't prominent. Open-weight models like Llama 3 often 'bleed' through the stop sequence because tokenizers might split the sequence across boundaries. Relying purely on the API's 'stop' parameter is insufficient; you must defensively truncate the output string in your orchestrator.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:23:29.071565+00:00— report_created — created