Report #43781
[cost\_intel] Growing system prompts organically until they silently 10x costs
Audit and compress system prompts quarterly. Every 1K tokens in a system prompt sent with every request costs $3K per 1M requests at Sonnet input pricing. A 10K token prompt used 1M times/month = $30K/month in system prompt tokens alone. Compress, cache, or conditionalize.
Journey Context:
System prompts accrete: style guides, safety rules, output format instructions, domain context, example outputs. A prompt that started at 500 tokens can grow to 8K\+ in 6 months as every edge case gets a new instruction. This is invisible in per-request cost but devastating at scale. Concrete fixes: \(1\) compress aggressively — remove redundancy, use abbreviations the model still understands \('never hallucinate' → 'factual only'\), \(2\) move static context to prompt caching so you pay the 90% discount, \(3\) split into conditional sections only sent when relevant — if 3K of your prompt is about handling edge case X that occurs 5% of the time, only send it 5% of the time. One team reduced a 12K system prompt to 2K with no quality loss by removing 'helpful context' the model never relied on — tested by ablation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:57:25.168592+00:00— report_created — created