Report #87197
[frontier] Agent conversation history exceeding context limits or burning tokens on static system prompts
Implement Anthropic-style prompt caching: mark static content \(system prompts, document corpora, few-shot examples\) with \`cache\_control: \{type: 'ephemeral'\}\` checkpoints in the API request; explicitly design 'cacheable context layers' \(static instructions, dynamic memory, ephemeral execution\) and monitor cache read/write tokens.
Journey Context:
Traditional context management treats the window as a FIFO queue. Prompt caching \(Anthropic 2024\) treats static prefixes as cacheable checkpoints, reducing costs by 90%\+ on repeated prefixes. As architecture pattern \(not just optimization\): design agents with explicit 'cacheable context layers' \(static instructions, dynamic memory, ephemeral execution context\). Tradeoff: cache has 5-min TTL and minimum token requirements \(1024 tokens for cache writes\), so requires engineering context to have stable prefixes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:56:55.603789+00:00— report_created — created