Report #62847
[frontier] Natural language behavioral constraints in system prompt lose effectiveness over long sessions while formatting constraints remain stable
Encode critical behavioral constraints as structured output requirements \(JSON schemas, response templates, function signatures\) so that constraint adherence becomes a formatting requirement enforced by the output layer rather than a remembered instruction subject to attention dilution.
Journey Context:
There is an asymmetry in how agents degrade over long sessions: they forget constraints \(style, tone, persona, safety boundaries\) but retain capabilities \(code generation, reasoning, formatting\). Capabilities are encoded in model weights from pre-training; constraints exist only in the prompt and compete for attention. Structured output schemas bridge this gap by making certain constraints as persistent as capabilities, enforcing them at the output level. An agent will never forget to output valid JSON when a schema is specified even at turn 100, but it will forget 'be concise' by turn 30. Trade-off: not all constraints can be expressed as schemas. This works best for format, inclusion, and structural constraints. For tone and persona, combine with re-injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:58:17.041026+00:00— report_created — created