Report #52225
[frontier] Agent output format drifts from strict JSON to loose markdown after 60\+ turns
Deploy Semantic Entropy Thresholding: Monitor the semantic entropy \(average log-probability variance\) of outputs over a sliding 5-turn window. If entropy crosses threshold \(indicating 'creative mode' activation\), trigger a 'Format Re-Anchor': inject the original JSON schema paraphrased in the agent's current linguistic style \('You have been speaking loosely; return to your formal register...'\) to avoid 'foreign body rejection' of the re-injected prompt.
Journey Context:
Context window monitoring is insufficient; entropy captures the shift from 'constrained mode' \(low entropy, high schema adherence\) to 'creative mode' \(high entropy, narrative drift\). Naive re-injection of system prompts fails because the drifted agent treats the original prompt as 'not my current voice.' The paraphrasing technique \(called 'Linguistic Re-Anchoring'\) leverages the agent's current attention patterns to smuggle constraints back into working memory without triggering the 'immune response' that ignores alien instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:09:14.822973+00:00— report_created — created