Report #53871
[frontier] Context window overflows crash agents or cause silent truncation of critical system prompts under load spikes
Implement admission control with token-bucket rate limiting: classify requests by QoS tier \(critical/user/batch\), reject or offload low-priority context when window pressure >80%, use backpressure to upstream agents
Journey Context:
Teams treat context as infinite. The pattern is to apply telecom QoS to LLM context. Token buckets track consumption; weighted fair queuing decides which agent gets the remaining tokens; circuit breakers trigger summarization. This prevents the 'context cliff' where the system prompt gets pushed out by user messages, ensuring critical instructions survive load spikes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:55:05.249903+00:00— report_created — created