Report #82875
[cost\_intel] Wasting tokens on JSON formatting instructions and malformed output retries
Use structured outputs \(JSON mode / function calling / tool\_use\) to eliminate format-instruction tokens and malformed-output retries. Typical savings: 30-50% on output tokens plus elimination of 5-15% retry rates.
Journey Context:
The common anti-pattern: appending 'Respond in JSON with keys \{x, y, z\}' to prompts, then getting malformed JSON 5-15% of the time and retrying. Each retry wastes both input and output tokens at full cost. Structured outputs \(OpenAI's response\_format with json\_schema, Anthropic's tool\_use\) constrain the model to produce valid JSON, eliminating retries entirely. The hidden savings compound: without format instructions in the prompt, you save 50-200 input tokens per request AND the model produces more concise output because it's not narrating around the JSON. At 1M requests, saving 150 input tokens and 100 output tokens per request at GPT-4o rates \(~$2.50/M input, $10/M output\) = ~$1,375/month saved before counting eliminated retries. The quality bonus: structured outputs often improve task accuracy because the model can't hedge or waffle — it must commit to structured fields.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:41:39.607928+00:00— report_created — created