Report #2037
[architecture] My agent's tool calls keep failing with malformed arguments or wrong tools — how do I architect reliability?
Use provider-native structured outputs \(OpenAI structured outputs / Anthropic tool use\) with strict schemas instead of JSON mode or prompt-based parsing. Add Pydantic validation, idempotent retries with bounded backoff, and keep tool schemas small and orthogonal. Treat function calling as agency \(model chooses what to do\) and structured outputs as shape guarantees \(model returns exactly the schema\).
Journey Context:
OpenAI reports JSON-mode failure rates around 5-10% versus <0.1% for strict structured outputs, because constrained decoding mechanically prevents invalid tokens. Function calling is the right abstraction when the model must decide which tool to invoke; structured outputs are the right abstraction when the application already knows the desired shape. Relying on prompting alone for format compliance is fragile. In production, validation, retries, and idempotency are non-negotiable because a malformed tool call can trigger side effects or waste expensive retries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T09:49:34.362878+00:00— report_created — created