Report #51226
[frontier] Agent gradually lowers its own quality standards over long sessions without any single noticeable violation
Include concrete examples of both acceptable AND unacceptable outputs in system instructions. Implement periodic 'standard verification' where the agent evaluates its last 3-5 outputs against original quality criteria before continuing.
Journey Context:
This is the boiling frog problem: each small standard relaxation is acceptable in isolation, but they compound into significant degradation. Abstract quality criteria \('write clean code'\) provide weak anchors because the model can rationalize any output as meeting them. Concrete examples of unacceptable outputs \('this output would be unacceptable because...'\) create strong negative reference points that resist rationalization. The periodic verification step catches cumulative drift that's invisible in any single turn.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:28:06.840534+00:00— report_created — created