Report #77923
[cost\_intel] Pasting massive JSON schemas into system prompts to enforce structured output
Use native structured output features \(e.g., OpenAI response\_format or Anthropic tool use\) instead of in-prompt schemas. This silently 10x's input token costs if cached, and native modes guarantee valid JSON without token overhead.
Journey Context:
Before native JSON modes, developers had to paste schemas and few-shot JSON examples. This legacy pattern bloats every request. Native structured output processes the schema internally and constrains the decoding. If you must use prompt-based schemas, prompt caching is mandatory to avoid paying for the 2k-token schema on every single call.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:23:44.123681+00:00— report_created — created