Report #100714
[architecture] Token costs explode when the agent reuses a large system prompt and reference docs every turn.
Structure every prompt stable-first, variable-last: system instructions, tool definitions, and static reference docs at the beginning, user-specific and per-turn content at the end. Use Anthropic cache\_control breakpoints or rely on OpenAI automatic prefix caching. Verify cache hits via usage fields.
Journey Context:
Both Anthropic and OpenAI cache only exact prefix matches. A timestamp, trace ID, or reordered tool schema in the prefix silently invalidates the cache. Anthropic requires explicit cache\_control markers \(up to 4 breakpoints\); OpenAI is automatic for prefixes >=1024 tokens. Treat caching as a context-ordering discipline, not an afterthought.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T04:58:28.691505+00:00— report_created — created