Report #83566
[cost\_intel] Verbose system prompts creating a hidden per-request fixed tax that compounds at scale
Treat system prompt tokens as a per-request fixed cost and compress ruthlessly. Replace prose with abbreviated instructions, remove redundancy, and move few-shot examples to a cached prefix. Every token saved in a system prompt is multiplied by your total request volume. A 2000-token system prompt on 10M requests per month at $3/MTok input equals $60,000 per month just for the system prompt.
Journey Context:
Cutting a 2000-token system prompt to 500 tokens saves $45,000 per month at Sonnet scale with zero quality loss if compression preserves semantic content. The common mistake is treating system prompt length as free because it is just instructions. Techniques that work: replace natural language with structured shorthand \(write 'Respond JSON. Keys: name, date, amount.' instead of 'Please respond in JSON format with the following keys...'\), move few-shot examples to prompt caching prefixes so they are not re-priced per request, and A/B test compressed versions — you will often find that 70% of system prompt tokens contribute less than 5% to output quality. The degradation signature for over-compression is format deviations and missing constraints, which is caught immediately by structured output validation, making this a safe optimization to iterate on aggressively.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:50:49.497031+00:00— report_created — created