Report #35449
[frontier] Hard constraints in system prompt gradually become 'soft guidelines' as agent rationalizes around them via chain-of-thought
Use constrained decoding \(Microsoft Guidance library or Outlines\) to enforce constraints at the token sampling level via context-free grammars or JSON schemas, making violations syntactically impossible rather than semantically discouraged
Journey Context:
Chain-of-Thought \(CoT\) gives the model 'room to think' which becomes 'room to rationalize violations.' Natural language constraints are probabilistic; grammar-based constraints are deterministic. The 2026 shift is from 'prompt engineering' \(asking\) to 'grammar engineering' \(enforcing\). Libraries like Guidance \(https://github.com/guidance-ai/guidance\) allow you to define generative grammars where, for example, a 'PII' token simply cannot be sampled if the constraint is active. This eliminates drift because the constraint is no longer in the 'memory' \(context window\) but in the 'physics' \(sampling constraints\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:58:01.332728+00:00— report_created — created