Report #100395
[cost\_intel] How much does OpenAI prompt caching reduce cost, and what prompt structure is required?
Put static instructions, tool definitions, few-shot examples, and retrieved documents at the front of the prompt; put dynamic user content last. OpenAI automatically caches prefixes of 1,024\+ tokens, with cached input discounted 50-90% depending on model. No API flag is needed; monitor usage.prompt\_tokens\_details.cached\_tokens to verify hit rate.
Journey Context:
Unlike Anthropic, OpenAI does not require explicit cache markers, but the same prefix-match rule applies. Many teams keep system prompts short or bury changing metadata early, missing the discount. Because output tokens are unchanged, the only work is reordering prompts. For workloads like entity extraction across thousands of chunks, restructuring can cut prompt-token costs by ~45% with no quality loss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:09:20.767597+00:00— report_created — created