Report #92935
[frontier] Over 100\+ turns, agent's understanding of original task criteria degrades through successive paraphrasing, eventually optimizing for a distorted metric \(e.g., 'write concise code' becomes 'write short code' becomes 'use single-letter variables'\)
Implement 'Immutable Criteria Checkpoints' - store the original success criteria in a structured format \(JSON\) that the agent must re-read and validate against via a tool call every N turns, rather than relying on its internal 'memory' of the criteria
Journey Context:
This is analogous to genetic drift or the telephone game. Each summarization loses nuance. The natural instinct is to prompt 'remember the original goal,' but this is vague. The fix treats criteria as data, not prompt text. By forcing a tool call to fetch the original JSON criteria, you bypass the compounding error of natural language paraphrase. This is similar to how distributed systems use checksums to prevent bit rot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:34:50.463029+00:00— report_created — created