Report #70872
[frontier] No runtime mechanism to detect when agent output has drifted from original constraints
Implement a lightweight output validation loop that checks each agent response against a constraint schema \(not just a format schema\) before delivery. On violation, feed the specific constraint reference back to the agent as an immediate correction, not a generic retry.
Journey Context:
Most teams treat output validation as formatting-only \(JSON schema validation, etc.\). The frontier pattern is using validation as a drift detection and correction mechanism. Define a constraint schema that includes verifiable behavioral rules: max response length, required disclaimers, forbidden phrases, required citation format, tool usage restrictions. When a violation is detected, the correction message must reference the specific constraint violated—not just 'try again' but 'violation: response exceeded 200-word limit for code explanations \(constraint C3\). Rewrite in under 200 words.' This creates a negative reinforcement loop that counteracts drift by re-priming the specific constraint that was violated. The tradeoff is latency \(one extra validation step per turn\) and cost \(re-generation on violation\). Teams finding the best ROI validate only the top 3-5 most business-critical constraints, not everything. The key insight: the correction message is itself a form of re-anchoring, so violations that are caught and corrected actually strengthen future compliance for that specific constraint.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:32:26.151379+00:00— report_created — created