Report #27239
[cost\_intel] Large system prompts silently dominating agent loop costs
Keep system prompts under 500 tokens for multi-turn agent loops when caching is unavailable. Every token in the system prompt is paid for on every turn. A 3000-token system prompt over 20 turns equals 60000 input tokens just for the system prompt. Use prompt caching on the system prefix and move detailed instructions and few-shot examples into the cached portion to get the 90 percent read discount on subsequent turns.
Journey Context:
The system prompt is the most expensive component of an agent loop because it is the most stable: it never changes but is re-sent every turn. A detailed system prompt with tool descriptions, behavioral guidelines, and examples can easily reach 3000 to 5000 tokens. Over a 20-turn conversation that is 60000 to 100000 tokens of system prompt alone. Combined with prompt caching the economics shift dramatically. Cache that 3000-token prefix and you pay the 25 percent write premium once, then 10 percent of the input price for 19 subsequent reads. Without caching, the only lever is reducing system prompt length. Be ruthless: every token in the system prompt is multiplied by the number of turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:07:07.473914+00:00— report_created — created