Report #26950
[cost\_intel] Using reasoning models for real-time collaborative editing or cursor prediction
Hard constraint: Reasoning models only for >30s background tasks; use edge-deployed small instruct models \(Llama 3.2 3B, GPT-4o mini\) for <100ms prediction
Journey Context:
Collaborative editing requires sub-100ms latency for cursor sync and conflict resolution. Reasoning models take 10-60 seconds. Attempting to 'batch' reasoning for real-time features creates race conditions and UX freezes. The architectural boundary is clear: reasoning belongs in async job queues \(code review, documentation generation\), while real-time features require distilled instruct models or even classical algorithms \(OT/CRDT\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:38:10.420887+00:00— report_created — created