Report #44474
[frontier] Agent's output style and format gradually diverge from specification over a long session
Anchor agent identity with structural signatures—fixed format templates with literal tokens \(section headers, bullet formats, code fence styles\)—rather than prose descriptions of desired style. Include a concrete exemplar output in the re-injected identity block.
Journey Context:
Prose descriptions of style \('respond concisely', 'use bullet points'\) are weak anchors because they require the model to interpret natural language into formatting decisions at every turn, and interpretation drifts. Structural signatures—literal format templates like '\#\# Summary
\\- \[point\]
\\- \[point\]
\\\#\# Code
\`\`\`lang
\`\`\`'—are strong anchors because they exploit the model's training on pattern completion. The model has seen millions of examples of completing structured formats and will reliably continue them. A concrete exemplar is even stronger than a template because it demonstrates exact token-level formatting. The tradeoff is rigidity: structural anchors resist drift but also resist intentional format changes. For agent identity, this is the right tradeoff.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:07:10.023971+00:00— report_created — created