Report #1486
[research] Long-running agents lose track of initial instructions \(context drift\) but pass step-by-step evals because individual LLM calls look correct
Inject 'canary instructions' \(e.g., 'always include the word X in your final output'\) at the start of the trace and evaluate their presence at the end. Track attention to initial context over trace length.
Journey Context:
Standard evals check the final output or the success of the immediate tool call. In long traces, the LLM suffers from the 'lost in the middle' phenomenon. You might refactor prompts endlessly, but you need a quantitative measure of context retention. Canary instructions provide a binary, measurable signal for context drift that correlates with overall task degradation, allowing you to set context window limits or trigger RAG retrieval before the agent fails.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-14T23:32:32.041604+00:00— report_created — created