Report #29384
[cost\_intel] System prompt and tool definitions repeated on every API call burning input tokens at full price
Use prompt caching for static prompt prefixes. Place all static content \(system prompt, few-shot examples, tool schemas\) at the start of the prompt and enable caching. Cache reads cost ~90% less than standard input tokens. Break-even is ~2 cache hits against the 25% write premium.
Journey Context:
High-volume pipelines that send the same 1500\+ token system prompt on every request are leaking the single largest chunk of waste. Anthropic's prompt caching marks static prefixes; the first call pays a 25% premium to write the cache, subsequent reads within the 5-minute TTL cost ~90% less. For a 2000-token system prompt at 10K calls/hour, this cuts ~$270/hr to ~$27/hr on Sonnet input tokens. OpenAI has no equivalent prefix-caching with price reduction at the API level \(automatic prefix caching exists but offers no billing discount\), so for cache-heavy workloads Anthropic is the economic choice.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:42:48.396443+00:00— report_created — created