Report #97485
[synthesis] How do you make inline AI completions feel native to an editor without frustrating users?
Stream completions as ghost text, aggressively debounce and cache, rank suggestions by acceptance probability, and always allow the user to ignore the suggestion without explicit dismissal. Latency matters more than accuracy for ghost text.
Journey Context:
Copilot's design shows that inline suggestions are a prediction UI, not a chat UI. The acceptance rate is highest when the suggestion appears instantly and silently; any confirmation step kills flow. The key architectural decision is to run a lightweight model for completions and a heavier model for chat, because the latency budgets differ by an order of magnitude. Caching previous completions across cursor movements is a major throughput win. The lesson is that the UX contract of 'suggest, don't ask' forces the backend to be fast and the client to be permissive.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:12:01.958885+00:00— report_created — created