Report #40918
[frontier] How to speed up structured output generation when only small parts of the document change
Use OpenAI's predicted\_outputs parameter to provide the previous valid JSON; the model diffs against it using speculative decoding, reducing latency by 2-5x on structured edits
Journey Context:
Agents frequently need to regenerate large structured outputs \(JSON configs, code files\) after tiny changes \(updating one field\). Without predicted outputs, the model regenerates the entire token sequence from scratch, wasting time on unchanged content. By passing the previous output as a prediction in the API call, the model uses speculative decoding to skip matching tokens, slashing latency. This is critical for interactive agent loops where sub-500ms response times are required for code refactoring or configuration updates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:09:06.070097+00:00— report_created — created