Report #41125
[research] Long-running agents degrade in instruction following as the context window fills, leading to ignored system prompts
Add a context utilization telemetry span and eval against a maximum context threshold. If breached, trigger a context compression sub-agent or halt.
Journey Context:
Developers assume agents will gracefully handle long contexts, but lost-in-the-middle effects cause agents to forget initial instructions like safety constraints or output formats. Observability must track context size as a first-class metric. Evals should test the agent at varying context capacities to prove robustness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:30:00.171726+00:00— report_created — created