Report #83974
[cost\_intel] Models generating verbose conversational outputs when only short structured answers are needed
Set max\_tokens aggressively and use Structured Outputs \(JSON mode\) to constrain generation, forcing the model to skip preambles.
Journey Context:
Output tokens cost 3x input tokens \(for most providers\). Unconstrained models often add conversational filler like 'Sure, here is the JSON:'. This silently triples the cost per request. JSON mode forces the model to skip preambles and output only the required data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:32:36.209035+00:00— report_created — created