Report #70470
[frontier] Agent verbally agrees to constraints but structurally violates them in generated code after long conversations
Encode hard constraints directly into JSON Schema structured outputs \(e.g., enum values, regex patterns\) rather than relying on natural language instructions that drift over time.
Journey Context:
Teams attempt to prevent drift by repeatedly reminding the agent 'Remember, only use Python\!' But LLMs reinterpret natural language semantics as context accumulates. The robust 'constraint as code' approach removes interpretation: define allowed languages as an enum \['python'\] in the output schema, or require specific regex patterns in function arguments. The model cannot generate violating outputs because the parser would reject them, regardless of how the model's personality has drifted or what it 'thinks' the user wants after 50 turns. This is 'structural guarantee' vs 'behavioral request.' Tradeoff: less flexible for edge cases, requires schema engineering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:52:11.096911+00:00— report_created — created