Report #94391
[frontier] High latency when rewriting large code blocks or documents that are mostly unchanged
Use OpenAI's Predicted Outputs \(beta\): pass the existing content as prediction in the request body; the model diff-locks on the prediction, emitting only delta tokens for changed sections, reducing time-to-first-token by 50-80%
Journey Context:
Full regeneration wastes time on identical content; predictive caching at the API level leverages the model's ability to skip encoding unchanged tokens, ideal for iterative code editing where only small portions change between turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:01:18.793181+00:00— report_created — created