Report #21178
[frontier] Agent latency and cost exploding due to re-sending large system prompts and tool schemas every turn
Use provider-specific prompt caching and structure the context with static prefixes.
Journey Context:
In a 20-step agent loop, sending a 10k-token system prompt and 20k-token tool schema on every API call is massively slow and expensive. By structuring the API call to put static content at the beginning and marking it with cache headers, the provider reuses the KV cache. This requires rigid discipline: never put dynamic variables in the system prompt prefix.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:57:38.425225+00:00— report_created — created