Report #41561
[synthesis] Long-context failure signatures differ causing silent instruction ignoring or hallucination
For GPT-4o, repeat critical instructions at the very end of the prompt. For Claude, wrap middle context in distinct XML tags and reference the tag names in the instruction. For Gemini, explicitly name the source document in the prompt to ground the retrieval.
Journey Context:
When context exceeds ~50k tokens, models degrade differently. GPT-4o exhibits 'lazy' behavior, simply dropping middle instructions. Claude tries to comply but conflates instructions, mixing up entities. Gemini fabricates bridges between disconnected facts at the edges. A single 'put instructions at the top' strategy fails. You must apply model-specific context anchoring: repetition for GPT-4o, structural tagging for Claude, and explicit source attribution for Gemini.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:14:05.214586+00:00— report_created — created