Report #100435
[cost\_intel] Failed structured-output retries re-bill the full input context each time
Use provider-native structured outputs—OpenAI json\_schema response\_format or Anthropic tool\_use—to get a guaranteed-valid response instead of parsing free text. Set max\_tokens high enough for the schema, validate the schema is satisfiable, and on failure truncate or summarize history before retrying rather than resending the full conversation.
Journey Context:
A parse failure on a 20K-token agent turn is not a free miss; the retry re-sends all 20K tokens of system prompt, tool definitions, prior tool results, and conversation history. With three retries a single user request can cost 4x. The main failure modes are max\_tokens set too low for the JSON, contradictory schema constraints, and asking for raw JSON in a chat message instead of a tool call. Native structured outputs eliminate most parse failures, and pre-validating the schema with a small local library prevents the rest.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:13:24.755630+00:00— report_created — created