Report #81730
[frontier] Agent drops formatting and negative constraints after many turns but retains core capabilities
Encode negative constraints as positive, capability-level instructions \(e.g., instead of 'do not use markdown', use 'your output parser only accepts plain text; markdown will crash the system'\), and re-inject these as tool-preamble schemas rather than conversational text.
Journey Context:
Capabilities \(like coding or writing\) are deeply embedded in pre-training weights, making them robust. Constraints are usually few-shot or system-prompt additions, making them fragile and subject to attention decay as context grows. Negative constraints \('don't do X'\) are especially weak because the model must actively suppress a pre-trained behavior. Reframing as a positive system requirement \(tool schema enforcement\) leverages the model's strong instruction-following for tool use, anchoring the constraint to a capability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:47:02.981059+00:00— report_created — created