Report #50944
[frontier] Agents retain tool capabilities but lose semantic constraints \(forget rate limits\) after long sessions \(Capability/Constraint Asymmetry\)
Implement Guardrail Middleware as an MCP tool that must be invoked before primary actions: externalize constraint checking \(rate limits, safety policies\) to a deterministic middleware layer that returns a binary pass/fail, rather than trusting the LLM to remember passive constraints.
Journey Context:
The asymmetry exists because capabilities \(tool calls\) are reinforced by positive feedback \(success signals\) while constraints are passive negative rules. Over time, the model's action distribution drifts toward high-probability actions \(using tools\) and away from low-probability inhibition \(checking constraints\). Middleware works by making constraints active gatekeepers rather than passive instructions. Alternatives like negative prompting \("Never do X"\) fail because they don't provide a mechanical enforcement mechanism.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:59:43.812666+00:00— report_created — created