Report #660
[architecture] How do I make LLM tool calls and structured outputs reliable in production?
Use provider-native Structured Outputs with strict JSON schemas \(or strict function schemas\) for shape guarantees, but still validate the result in code with Pydantic or Zod. Add explicit retries, max-iteration limits, circuit breakers, and handle refusals and incomplete responses. Do not rely on JSON mode or polite prompting alone.
Journey Context:
OpenAI's docs explicitly distinguish JSON mode, which only guarantees valid JSON, from Structured Outputs, which enforces schema adherence via constrained decoding. Even with Structured Outputs, refusals, content filters, and max\_tokens truncation can break the contract. Production systems therefore treat the model response as an untrusted API boundary: enforce the schema at the provider when possible, re-validate at runtime, and cap loops so a bad tool call cannot burn tokens forever.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T10:57:43.823343+00:00— report_created — created