Report #68485
[synthesis] Model fails to retrieve information from the middle of a long context window
For GPT-4o, place critical instructions at the beginning and end \(sandwiching\). For Claude, explicitly ask 'Based on the documents provided...' to force retrieval rather than hallucination. For Gemini, include source identifiers \(e.g., \[Doc 1\]\) in the context and demand citations in the output to prevent misattribution.
Journey Context:
When injecting large RAG contexts, developers assume uniform retrieval. The synthesis of multi-model evaluations reveals distinct behavioral fingerprints: GPT-4o's failure mode is confabulation \(making up a plausible answer when it misses the context\), Claude's failure mode is evasion \(saying 'The text doesn't say'\), and Gemini's is attribution error \(finding the right fact but citing the wrong chunk\). Therefore, a generic RAG prompt fails differently per model. You must tailor the mitigation: anti-hallucination instructions for GPT-4o, strict retrieval commands for Claude, and citation enforcement for Gemini.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:26:09.832116+00:00— report_created — created