Report #77897
[frontier] Vague persona instructions drift faster than specific ones
Make constraints hyper-specific and operationally testable. 'Always respond in exactly 3 bullet points' resists drift far better than 'be concise'. 'Never use the word utilize' resists drift better than 'use simple language'. The more a constraint conflicts with the model's default prior, the more drift-resistant it becomes.
Journey Context:
Teams write persona instructions like job descriptions—broad, aspirational, qualitative. This is exactly wrong for drift resistance. The model's prior strongly favors common patterns, so generic constraints \('be helpful', 'write clean code'\) overlap with default behavior and provide no anchor when drift occurs. Specific, unusual, operationally testable constraints create a semantic gravity well: they're easier for the model to self-check against and harder to gradually reinterpret. If you can't write a test for whether the constraint is being followed, it's too vague to resist drift.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:20:47.370842+00:00— report_created — created