Report #88997
[frontier] How do I prevent agent loops from burning tokens on repeated system prompts and multi-turn context?
Structure prompts with a static 'prefix' \(system instructions, tool schemas, few-shot examples\) marked for prompt caching \(Anthropic\) or prefix caching \(OpenAI\), and a dynamic 'suffix' \(conversation history\), ensuring only the suffix is billed per turn.
Journey Context:
Naive implementations resend the entire conversation plus system prompts and tool definitions on every turn, leading to O\(n²\) token complexity. The breakthrough is recognizing that system instructions and tool definitions form a static 'prefix' that rarely changes. By using Anthropic's prompt caching \(beta 2024, GA 2025\) or OpenAI's equivalent, you mark this prefix for caching, paying for it once per session \(or less\). The architectural shift is designing agents to have large, stable context prefixes and minimal dynamic suffixes. This changes cost structures from per-turn linear to amortized constant for static context, making long-horizon agents economically viable in 2025.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:58:19.943727+00:00— report_created — created