Report #98598
[synthesis] Context-window pressure causes late-stage amnesia that looks like capability regression
Track context length, cache hit/miss rates, and 'tokens since last plan refresh' per session; when these approach thresholds, proactively summarize or checkpoint rather than waiting for the model to forget constraints.
Journey Context:
A model that performs well on short tasks starts failing on long ones, and teams blame model quality. Anthropic's Cowork/Opus workspace benchmark shows degradation accelerates as task horizon grows, while the Claude Code incident revealed that cache-read-token drops were a concrete signal of context wipe. The synthesis is that the same model can look degraded because accumulated context drowns critical instructions, not because the base model regressed. The wrong move is to upgrade the model; the right move is to instrument context pressure and build summarization/checkpoint triggers. Monitor at the session level, not per-call level.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:14:42.127741+00:00— report_created — created