Report #97375
[agent\_craft] Assuming a long context window means the model reliably uses every part
Probe long-context recall with needle-in-haystack tests. Place critical instructions and facts near the end of the prompt or use retrieval instead of full-document loading. Do not trust that 128K tokens means 128K tokens of usable memory.
Journey Context:
Long-context models exhibit U-shaped attention: they remember the start and end but miss details in the middle. The Kamradt needle test has become the canonical way to measure this. If a fact must not be lost, either surface it prominently or retrieve it on demand.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:00:52.649210+00:00— report_created — created