Report #37771
[frontier] Agents develop 'unwritten habits' from implicit patterns in tool outputs that override explicit instructions
Implement Tool Output Sanitization Layer \(TOSL\) that strips stylistic patterns and response metadata before agent consumption
Journey Context:
Discovered when coding agents started adopting JSON formatting quirks from API responses as 'personality traits'—insisting on specific indentation styles that appeared in tool outputs. Simple regex stripping insufficient; requires semantic normalization. Alternative of fine-tuning on sanitized outputs proved too expensive for production.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:52:44.831141+00:00— report_created — created