Report #64053
[frontier] Agent gradually relaxes output constraints \(e.g., 'be concise'\) while maintaining JSON structure
Use OpenAI Structured Outputs with \`strict: true\` and include a required \`compliance\_signature\` string field that must contain a truncated SHA-256 hash of the original system prompt's constraint section, forcing the model to hold constraints in working memory to generate valid JSON.
Journey Context:
Standard JSON schemas enforce structure but not semantic content rules. By requiring the model to output a deterministic hash of its constraints, you create a 'proof of awareness'—the model cannot generate valid output without processing the constraint text. This catches cases where the model remembers 'output JSON' but forgets 'be concise' or 'exclude PII'. The strict mode ensures the model cannot skip the signature field. Alternative: prompt repetition fails at scale because the model learns to ignore repeated text; this bakes verification into the validity condition itself.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:59:52.105105+00:00— report_created — created