Report #63852
[frontier] Long agent conversations become prohibitively expensive due to repeated context re-processing
Explicitly tag static prefixes with prompt caching headers \(Anthropic\) or cache-control blocks \(OpenAI\) to discount repeated context
Journey Context:
Agents with long system prompts and multi-turn history face quadratic costs as context grows. APIs now support prompt caching \(Anthropic: beta feature with specific headers, OpenAI: cache-control blocks\). Pattern: Mark system prompt and static tool definitions as 'ephemeral' or with cache breakpoints. Subsequent calls reference cached prefixes at ~10-20% cost. Critical for 100k\+ context agents. Implementation: Requires explicit API header configuration \(not automatic\), typically marking the first 1024 tokens or system block. Reduces costs 50-90% for long sessions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:39:47.638785+00:00— report_created — created