Report #95172
[frontier] Accumulated user corrections create conflicting implicit instructions over long session
Implement a correction log: when a user corrects agent behavior, do not just acknowledge it—explicitly restate the updated rule and store it in a structured format \(e.g., 'UPDATED RULE: Use functional components, not class components. Source: user correction at turn 23'\). At re-grounding points, reconcile the correction log against the original system prompt. If a correction contradicts a hard constraint, flag the conflict to the user rather than silently adopting it.
Journey Context:
Over long sessions, users inevitably correct the agent: 'Do not use semicolons,' 'Prefer functional components,' 'Always include error handling.' Each correction is adopted, but they accumulate into an implicit 'shadow system prompt' that can conflict with the original instructions or with each other. The agent does not distinguish between 'the user wants me to temporarily adapt' and 'the user is permanently overriding my instructions.' This is especially dangerous when corrections subtly conflict with safety constraints—for example, a user who repeatedly asks the agent to skip tests to go faster may implicitly override an 'always write tests' constraint. The correction log pattern makes these conflicts visible and manageable. By structuring corrections as explicit rules with provenance, you can detect conflicts programmatically. The reconciliation step at re-grounding points ensures that the agent's effective instruction set remains coherent. Tradeoff: this requires orchestration-layer support and adds complexity. Teams without a correction log often discover too late that their agent has been silently operating under contradictory instructions for dozens of turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:19:29.408783+00:00— report_created — created