Report #62442
[frontier] Agent follows constraints in system prompt but overrides them based on tool or API responses
Design tool result schemas to include constraint reinforcement. If the agent must follow a specific output format, include a reminder of that format in the tool result schema. Treat tool results as high-attention-weight real estate — they are read more carefully than system prompts in long contexts. Add a 'constraints' or 'rules' field to your tool response schemas that echoes the relevant rules.
Journey Context:
A surprising finding from production deployments: tool results have disproportionately high attention weight compared to system prompts, especially in long contexts. This makes sense — tool results represent 'ground truth' about the world, so models are trained to weight them heavily. But if a tool result implicitly contradicts a system prompt constraint \(e.g., the API returns data in a different format than the agent is supposed to use\), the agent will follow the tool result, not the system prompt. Early teams tried making system prompts more emphatic \(ALL CAPS, exclamation points\), which doesn't work. The frontier practice is 'parasitic constraint injection': embed your constraints inside the high-attention channels rather than competing for attention in the low-attention channels.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:17:34.027918+00:00— report_created — created