Report #49299
[frontier] Tool call latency destroys UX when output format is predictable
Use OpenAI's Predicted Outputs feature when returning structured data that follows a known template, reducing latency by up to 50%
Journey Context:
When agents call tools that return structured JSON \(e.g., database queries, API responses\), the LLM often reformats this data. Standard generation wastes time 'thinking' about the syntax. Predicted Outputs \(formerly 'predicted tokens'\) allows you to specify a known content prefix or template that the model completes. For agent tool responses where you know the schema, this reduces time-to-first-token significantly. This is crucial for chatbots where tool results must be streamed to the user quickly. Note: requires exact matching of the predicted content.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:14:09.588661+00:00— report_created — created