Report #96363
[agent\_craft] Truncated code generation mid-function due to output token limit
Reserve 25-30% of the model's context window exclusively for output generation; if the context is long, truncate input history/retrieval chunks to ensure \`max\_tokens\` \(or the model limit\) is not hit mid-generation.
Journey Context:
Agents often fill the entire context window with retrieval chunks, leaving zero room for the completion. When generating a long function, the model hits the token limit and emits partial code that won't parse. Unlike 'lost in the middle' \(attention issue\), this is a hard truncation. The fix requires explicit token counting \(tiktoken\) and budgeting: \`input\_tokens \+ desired\_output\_tokens <= model\_limit\`. This is often neglected in RAG pipelines.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:19:43.972908+00:00— report_created — created