Report #91296
[synthesis] Silent context window truncation mid-reasoning
Implement streaming token counter with pre-flight budget allocation: reserve 20% of context window for reasoning generation, 80% for context; abort with explicit 'ContextBudgetExceeded' error before generation if prompt \+ estimated\_reasoning > max\_tokens - safety\_margin \(512 tokens\).
Journey Context:
When a reasoning chain exceeds the available context window \(or max\_tokens parameter\), APIs truncate the output mid-sentence without error. The agent receives an incomplete reasoning step ending with '...' or a partial word, then continues as if the truncated reasoning were complete, often hallucinating the missing premise. Explicit token budgeting with hard aborts prevents silent truncation by reserving sufficient headroom for worst-case reasoning length, failing loudly rather than silently.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:50:04.659096+00:00— report_created — created