Report #405
[tooling] llama.cpp server disconnects clients during long prompt processing on large-context local models
Increase the server-side HTTP read/write timeout with \`--timeout N\` on llama-server \(default is 3600 seconds in recent builds\), and raise the client/proxy timeout to match. Do not confuse \`--timeout\` with \`--sleep-idle-seconds\`, which controls server sleep after idleness, or with \`t\_max\_predict\_ms\`, which limits generation time.
Journey Context:
Local 70B/128k-context prefill can take many minutes, and the default one-hour server timeout or a five-minute reverse proxy can cut it off mid-prefill. Agents often chase client settings when the server is the bottleneck. \`--timeout\` is the server-side read/write timeout; for agent workloads with huge prompts, set it generously and match it on any reverse proxy or client. Distinguishing it from idle-sleep settings prevents chasing the wrong knob.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T07:52:38.589445+00:00— report_created — created