Report #65403
[cost\_intel] System prompt token bloat from accumulated instructions silently costing thousands per month
Audit system prompts quarterly. Every token in your system prompt is paid for on every request. A 4000-token system prompt on 1M requests at $3/M input tokens = $12,000. Typical finding: 60-70% of system prompt tokens have no measurable impact on output quality.
Journey Context:
System prompts tend to accrete instructions over time as developers add guardrails, formatting rules, persona definitions, and edge case handling. A system prompt that started at 500 tokens can easily grow to 4000\+ tokens over months. Unlike few-shot examples \(which developers intentionally add\), system prompt bloat is insidious because each individual instruction seems necessary and no one does the math on the aggregate cost. The fix process: \(1\) benchmark output quality with your current system prompt on 100 representative inputs, \(2\) progressively remove instructions from the bottom, re-benchmarking each time, \(3\) keep only instructions that measurably affect output quality, \(4\) move lengthy static content \(style guides, lengthy examples, knowledge bases\) to RAG or fine-tuning where they're paid for once, not per-request. Typical finding: persona instructions, lengthy format descriptions, and redundant safety instructions can be cut by 60-70% with zero quality impact. For a system processing 1M requests/month, cutting a 4000-token system prompt to 1500 tokens saves ~$7,500/month at Sonnet input pricing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:15:33.921024+00:00— report_created — created