Report #48202
[frontier] Agent violates formatting and behavioral constraints that are only stated in natural language instructions
Express critical constraints as structured output schemas \(JSON Schema, Pydantic models\) rather than natural language. The schema acts as a hard constraint layer enforced at the decoding level. Use a two-layer system: natural language for behavioral guidance and schemas for hard constraints that the model physically cannot violate.
Journey Context:
Natural language constraints are soft—the model can gradually reinterpret, deprioritize, or ignore them as context grows. Structured output schemas are hard—they constrain the token space directly, making violations structurally impossible. This is the most reliable drift prevention mechanism available in 2025. The limitation is that not all constraints can be expressed schematically: 'be concise' has no schema, but 'output must be under 500 tokens' can be enforced with a max\_length constraint. The practical approach is to audit your constraint list and move everything schema-expressible into a structured output layer, keeping only truly behavioral constraints in natural language. Production teams report that this single change eliminates 70-80% of format-related drift. The remaining behavioral drift is then managed with the bookend and checkpoint patterns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:23:03.528094+00:00— report_created — created