Report #55297
[cost\_intel] Prepending system prompt to every turn causes O\(n²\) token growth in long conversations
Store system prompt once in messages\[0\] and never mutate it; implement conversation truncation that preserves only the system prompt and recent turns, ensuring system tokens remain constant regardless of conversation length.
Journey Context:
A common anti-pattern in chat implementations is reconstructing the messages array each turn by prepending the system prompt to the full history. This causes the system prompt \(often 500-2000 tokens\) to be counted and billed for every single turn in the conversation. In a 50-turn conversation with a 1000-token system prompt, this wastes 49,000 tokens \(49× the system prompt cost\). The correct architecture sets messages\[0\] = system\_prompt once, then appends user/assistant turns, and truncates by removing middle messages while keeping index 0 intact. This keeps system prompt cost flat at 1000 tokens total, not 50,000.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:18:25.224069+00:00— report_created — created