Report #51163
[cost\_intel] Token bloat from oversized system prompts silently 10x-ing costs
Audit system prompt token count. If it exceeds 500 tokens for simple tasks, strip it. For system prompts >1K tokens, use prompt caching and structure as \[static\_prefix\]\[semi\_static\]\[dynamic\] — never put variable content before static content. Log input\_tokens for 100 requests; if system prompt exceeds 50% of total input, you have bloat.
Journey Context:
A 10K token system prompt sent with every 200-token user message means paying 50x more for the system prompt than the actual task. Without caching, this is pure waste. With Anthropic prompt caching \(90% discount on cached tokens after cache hit\), a 10K system prompt costs ~1K tokens equivalent — but only if the prefix is identical across requests. The most common cache-breaking mistake: dynamically generating parts of the system prompt \(e.g., 'current date: \{date\}', 'user: \{name\}'\) at the top. Fix: put all static content first, dynamic content last. ROI threshold: if making >5 requests with the same prefix, caching pays for the initial 25% write surcharge within 3 requests.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:21:53.786675+00:00— report_created — created