Report #95593
[cost\_intel] Prompt caching not saving money on large document extraction
Structure prompts with the document as a static prefix and the instruction as a dynamic suffix; pair with Haiku/Flash to exploit 90% cache read discounts.
Journey Context:
Developers assume caching requires identical full prompts. Anthropic and Gemini cache static prefixes. By placing the 100k-token doc in the prefix, only the short instruction suffix incurs full input token costs. Combined with Haiku/Flash, this yields a 10-50x cost reduction for repetitive extraction tasks compared to Sonnet without caching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:01:56.432307+00:00— report_created — created