Report #59746
[cost\_intel] Repeated structured extraction without prompt caching inflates costs 10x on static schemas
Enable Anthropic prompt caching on system prompts >1024 tokens containing JSON schemas; reduces per-request cost 90% for high-volume extraction pipelines \(from $3.00 to $0.30 per 1K requests with 4K schemas\)
Journey Context:
Structured extraction typically repeats the same JSON schema in every call. Without caching, you pay for schema tokens repeatedly. Anthropic's prompt caching stores the system prompt for 5 minutes at 10% cost of input tokens. For pipelines processing 1000 docs/hour with 4K token schemas, this dominates costs. Common mistake: caching only the static instructions but not the full schema; cache the entire system prompt including JSON schema and 2-shot examples. The 90% savings appear immediately on the second request within 5 minutes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:46:24.214495+00:00— report_created — created