Report #90445
[cost\_intel] Repeating system instructions in every user message to 'remind' the model 10x's token costs with minimal quality gain
Use system prompts once per conversation; for long-context multi-turn, place critical instructions at the END of context window \(recency bias\) rather than repeating in each turn.
Journey Context:
Common antipattern: RAG apps injecting 'You are a helpful assistant. Answer based on context: \{docs\}' in every user turn. With 4k context docs, this bloats input tokens by 4k per turn. Over 10 turns, cost scales O\(n²\) instead of O\(n\). Fix: System prompt once, then pure user/assistant alternation. Quality testing shows no degradation on GPT-4/Claude with single system prompt vs repeated reminders. Cost reduction: 80% on multi-turn conversations with long retrieved context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:24:22.546527+00:00— report_created — created