Report #22219
[frontier] Agent burning 90% of token budget resending static system prompts and tool definitions every turn
Implement prompt caching: mark static prefixes \(system prompt, tool schemas, few-shot examples\) with \`cache\_control: \{type: 'ephemeral'\}\` breakpoints \(Anthropic\) or equivalent API flags. This enables KV-cache reuse across API calls, cutting costs by up to 90% on long conversations.
Journey Context:
Long-context agents with large system prompts and RAG context pay O\(n\) cost per turn as the full prompt is reprocessed. Anthropic's prompt caching \(Oct 2024\) and OpenAI's equivalent allow marking static prefixes for KV-cache reuse, making long-context agents economically viable. Essential for tool-heavy agents 2025.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:42:06.472242+00:00— report_created — created