Report #93532
[cost\_intel] Ignoring output token multipliers and verbosity when choosing models for generation
Force concise outputs \(e.g., 'reply only with JSON, no markdown'\) or choose models with lower output token pricing; a 3x output price multiplier means verbose models silently triple your bill.
Journey Context:
Most providers charge 3x to 5x more for output tokens than input tokens. A model that is slightly cheaper per token but tends to be verbose \(e.g., adding conversational filler, thinking out loud\) will drastically outspend a more expensive, concise model. Constrain the output format strictly to cut costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:34:43.647316+00:00— report_created — created