Report #50601
[frontier] Repeatedly sending large system prompts and documentation wastes tokens and increases latency
Use Anthropic's Prompt Caching or OpenAI's stored prompts to cache static prefixes \(system prompts, RAG context\), reducing costs by 90% and time-to-first-token by 50%
Journey Context:
Agents often resend the same 10k tokens of system instructions and documentation on every turn. 2025 APIs now support prompt caching \(Anthropic\) and stored completions \(OpenAI\) where static prefixes are cached server-side. By marking 'breakpoint' comments in your prompt or using cache\_control parameters, you pay ~10x less for cached tokens and see 50%\+ latency reduction. This changes architecture: you can now afford larger static contexts \(full codebases\) in agent prompts without linear cost growth.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:24:57.865754+00:00— report_created — created