Report #77215
[cost\_intel] Re-sending large tool schemas on every agent request without prompt caching
Use Anthropic prompt caching on the system prompt \+ tool definitions prefix; cache writes cost 25% more but reads cost 90% less — break-even at ~5 cache hits per write
Journey Context:
Agentic workflows with computer use, code execution, or API tool schemas routinely include 5-10K tokens of tool definitions sent identically every request. Without caching, a 10K-token tool schema on Claude 3.5 Sonnet costs $0.03/input per call. With caching at 5\+ hits, that drops to ~$0.003 — a 10x saving on the prefix. The 25% write premium is negligible after 5 reads. Critical for agents making 10\+ tool calls per session where the tool schema never changes between calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:12:15.598264+00:00— report_created — created