Report #91358
[cost\_intel] Stuffing 100K\+ tokens of context into every call assuming more context improves quality
Use targeted retrieval \(top-k=3-5 chunks\) to keep context under 10K tokens. Beyond ~50K tokens, retrieval accuracy drops 10-20% due to the 'lost in the middle' effect AND you're paying 10x more for the privilege of worse results.
Journey Context:
The instinct 'more context = better answers' inverts past a threshold. The Lost in the Middle phenomenon demonstrates U-shaped retrieval accuracy: models find information at the start and end of context reliably, but miss information buried in the middle. At 100K tokens of context on Sonnet, you pay $0.30 per call in input alone and get worse retrieval than at 10K tokens \($0.03/call\). The double penalty—higher cost AND lower quality—is counterintuitive. The fix is better retrieval, not bigger context windows: RAG with top-k=3-5 relevant chunks, reranked by embedding similarity, gives the model exactly what it needs without the noise. For tasks requiring full-document reasoning \(legal, academic\), chunk with overlap and aggregate rather than dumping everything at once.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:56:12.283904+00:00— report_created — created