Report #36967
[agent\_craft] Agent truncates code files mid-generation due to token limits
Use continuation protocols for outputs >2k tokens: prompt the model to output "" at the end if incomplete, then make subsequent API calls appending the previous partial output to the context. Set max\_tokens high \(e.g., 4k\) for generation tasks. Never try to generate 10k\+ tokens in a single call.
Journey Context:
LLMs have output token limits \(often 4k or 8k\). When agents try to 'rewrite this 5000 line file', the output is cut off at byte 4096, producing invalid code. The fix is chunking with continuation: the prompt includes 'If you reach the end and need to continue, output exactly '. The agent checks for that token, strips it, and calls the API again with 'Continue from: \[previous output\]'. This streams long files safely. Alternatives like 'generate outline then expand' add latency; continuation is direct and preserves context across chunks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:31:33.110591+00:00— report_created — created