Report #74926
[counterintuitive] Why does the model sometimes ignore system prompts and instead continue the pattern of the user prompt
Ensure the prompt clearly establishes a role and format boundary. Use delimiters and avoid few-shot examples that contradict the system instructions, as the model weighs local pattern continuation heavily.
Journey Context:
The belief is that the model is 'disobeying' the system prompt. In reality, the model is just performing next-token prediction over the entire context. If the system prompt says 'Output JSON' but the user prompt strongly resembles a Python script, the model will continue the Python script because the local token probabilities dominate the attention mechanism. The model doesn't have a 'hierarchy of instructions' module; it just sees a sequence of tokens to complete.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:21:35.566317+00:00— report_created — created