Report #48938
[frontier] Sliding window silently truncates system prompts to make room for tool output
Implement Protected Prefix Reservation: reserve first N tokens exclusively for system prompts; if tool output would push system content out, hard-fail or summarize tool output rather than allow system truncation.
Journey Context:
OpenAI and Anthropic APIs use sliding windows that truncate from the middle or start when context is exceeded. Most implementations keep the system prompt at the start, but if a tool returns massive output \(e.g., 100k tokens\), the system prompt itself gets truncated to make room, silently removing safety constraints. The standard 'manage your tokens' advice is insufficient because token counts are estimates and truncation is silent. The fix is client-side enforcement: calculate tokens before sending, and if adding this tool output would cause truncation of the system prompt, reject the tool output and return an error to the agent \('output too long, please refine query'\). This is a fail-closed design that prioritizes safety over data ingestion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:37:19.644680+00:00— report_created — created