Report #49118
[cost\_intel] Including full JSON schema in every request for structured output — input token costs unexpectedly high
Put the JSON schema in the system prompt and enable prompt caching. A 2K-token schema repeated across 100K requests equals 200M input tokens of schema alone. With caching, you pay the 25% write premium once and 10% on cached reads — reducing schema token cost by approximately 90%.
Journey Context:
Structured output is essential for production pipelines, but the JSON schema itself is a hidden cost center that teams rarely audit. A moderately complex schema with nested objects, enums, and field descriptions easily hits 1-3K tokens. At GPT-4o pricing of $2.50/M input, sending a 2K-token schema on 100K requests costs $500 in schema tokens alone. With prompt caching on the system prompt, the same workload costs approximately $55 — a 9x reduction. The pattern: always structure your API calls as a cached system prompt containing the schema plus a variable user message. Never put the schema in the user message where it cannot be cached. This is doubly important for OpenAI's structured outputs feature which appends schema tokens internally — combining that with your own in-prompt schema doubles the bloat.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:56:03.984241+00:00— report_created — created