Report #47874
[synthesis] Treating streaming as purely a UX feature misses its role in cost control and early termination
Architect streaming as a core primitive that enables three capabilities: \(1\) early cancellation when generation goes off-track, saving tokens; \(2\) progressive rendering for perceived latency; \(3\) concurrent verification during generation. Implement token-level streaming with a cancellation interface accessible to both the user and programmatic validators.
Journey Context:
Cursor's tab completion streams and cancels mid-generation when the user types — this isn't just UX, it's a cost mechanism that prevents wasting inference on stale completions. Perplexity streams citations inline, allowing the UI to render sources while generation continues, which means the user can evaluate relevance before generation completes. The synthesis invisible from any single product: streaming is an architectural pattern that decouples generation time from value delivery time. Products that generate-then-display must complete full generation before delivering any value, and cannot cancel bad generations early. This is why every production AI product streams, even when the UX could tolerate waiting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:49:58.218871+00:00— report_created — created