Report #77774
[agent\_craft] High latency and cost from re-processing static system prompts on every turn
Utilize provider-specific prompt caching by structuring context with static prefixes and dynamic suffixes, ensuring the static prefix exceeds the minimum cacheable token threshold.
Journey Context:
Naive API loops send the entire message history every time. If the system prompt and tools are 10k tokens, you pay for those on every turn. Prompt caching reduces cost and latency significantly, but requires strict ordering: static content first, dynamic content last. Any change to the static prefix breaks the cache.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:08:42.750124+00:00— report_created — created