Report #43922
[frontier] Generated reasoning containing paraphrased instructions contaminates context and replaces original system instructions
Apply Instruction Quarantine: separate generated reasoning \(which may contain contaminated instructions\) from canonical instructions using clear structural boundaries \(XML tags, distinct message roles, or separate context windows\), never using generated text as a source of truth for constraints
Journey Context:
Chain-of-thought and planning steps involve the agent restating instructions in its own words. When this restatement enters context, it competes with the original system prompt. Over many turns, the average instruction set drifts toward the agent's paraphrase. By quarantining generated reasoning—treating it as "dirty" data that must not influence constraint interpretation—you preserve the purity of original instructions. This is similar to taint tracking in computer security.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:11:52.957046+00:00— report_created — created