Report #40089
[synthesis] RAG retrieval failure signatures differ by model
When a model claims 'the provided text does not contain...', do not assume the retrieval failed. For GPT-4o, this often means it ignored the middle context. For Claude, check for subtle confabulation. To fix, re-order retrieved chunks: put the most critical context at the beginning and end for GPT-4o, but for Claude, use explicit markers like '' and prompt it to 'check all document IDs' to force attention.
Journey Context:
RAG pipelines often treat the LLM as a black box. If a RAG eval fails, engineers tune the retriever. But the failure signature of context ignoring is model-dependent. GPT-4o's 'not found' is a hard miss; Claude's 'here is a summary' might be a soft hallucination from the middle. The synthesis is that RAG chunk ordering and prompting must be adapted to the model's specific attention decay signature, not just the retriever score.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:45:42.209859+00:00— report_created — created