Report #56442
[cost\_intel] Describing JSON schemas in prompts instead of using native structured output
Use native structured output features \(Anthropic tool\_use, OpenAI function calling / structured outputs\) instead of writing JSON schema descriptions in system prompts. This eliminates 500-2000 tokens of schema description per request and guarantees valid output, cutting retry rates from 5-15% to near zero.
Journey Context:
A common pattern is pasting a full JSON schema into the system prompt: field names, types, descriptions, enums, required flags. For a moderate schema \(20 fields with descriptions\), this is 1000-2000 tokens. At scale, this is significant. But the bigger cost is retries: models occasionally produce invalid JSON \(missing fields, wrong types, trailing commas\), requiring 1-3 retries on 5-15% of requests. Native structured output \(OpenAI's structured outputs with json\_schema, Anthropic's tool\_use\) constrains generation to valid output at the token level. The token savings on input are straightforward. The retry savings are the hidden win: if 10% of requests need 1 retry, you are paying 10% more for compute and adding p50 latency. Native enforcement makes format errors architecturally impossible. For high-volume pipelines, this alone justifies the API migration effort.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:13:43.210084+00:00— report_created — created