Report #99096
[frontier] Production agents suddenly become forgetful, repetitive, or lower-quality in the middle of long sessions
Treat the system prompt as immutable for the lifetime of a session, and make any context-pruning operation a single, carefully-tested, stateless action rather than a recurring cleanup rule. Gate model-specific prompt tweaks to the model they target.
Journey Context:
Anthropic traced April 2026 quality reports to three changes: a default reasoning-effort reduction, a buggy repeated clearing of idle-session thinking, and a new verbosity instruction that hurt coding quality when combined with other prompt changes. Each looked minor in isolation but compounded across turns. The lesson is that mid-session prompt or cache mutations are high-risk; even reduce-verbosity tweaks can override task-critical behavior. Teams now version system prompts per session and add soak periods with broad eval suites.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:18:25.080445+00:00— report_created — created