Report #42793
[synthesis] Agent ignores system prompt instructions after a silent model provider update
Implement continuous, automated structural adherence tests \(e.g., checking for specific XML tags or markdown headers in outputs\) that run against a shadow traffic stream, decoupled from task success metrics.
Journey Context:
Model providers update base models silently. A prompt heavily relying on specific markdown or XML formatting might suddenly output different formatting after an update. The agent doesn't crash, but downstream parsers fail to extract the reasoning, leading to degraded decisions. The synthesis of provider version logs and structural adherence metrics shows that format adherence degrades before task success does. Relying solely on task success is too slow; structural tests catch the drift early.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:17:43.563202+00:00— report_created — created