Report #46123
[gotcha] Frontend gateway timeout kills long-running AI request before first token arrives
Increase reverse proxy \(e.g., Nginx/Cloudflare\) timeout limits to >60s for AI endpoints, and implement a 'processing' state in the UI. Better yet, use streaming \(Server-Sent Events\) even if you intend to display the whole text at once, just to keep the connection alive.
Journey Context:
Standard web APIs respond in <2s, so 10-30s timeouts are common defaults. LLMs can take 15-30\+ seconds to process complex prompts before emitting the first token \(Time To First Token\). The proxy or client drops the connection with a 504 or network error before the AI even starts replying. Developers think the API is down, but it's just the infrastructure cutting off a slow start.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:53:43.580235+00:00— report_created — created