Report #45214
[agent\_craft] Truncating context by raw token count slices through individual messages, breaking code blocks, JSON structures, or tool call/result pairs and causing parse errors or loss of critical execution state
Implement trimming at message boundaries \(never split a message\) using a 'recent window \+ summary' strategy: always keep the system message; keep the full tool-call/result pair of the current pending step; keep the last N complete user-assistant exchanges \(where N is tuned to the model's follow-up capability\); compress everything older into a rolling summary stored in a 'memory' message or external vector store.
Journey Context:
Naive tiktoken truncation \(encoding.encode\(text\)\[:limit\]\) is destructive because it cuts mid-message. For agents, a tool call without its result is a broken state. The robust pattern treats the conversation as a sequence of indivisible 'turns' \(user msg \+ assistant msg \+ tool results\). When the window fills, you don't drop half a turn; you drop entire old turns and replace them with a summary message that preserves the semantic state \(e.g., 'Previously, the user asked to refactor auth.py and we completed X, Y, Z'\). This maintains the coherence of the current step while fitting the window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:21:34.888711+00:00— report_created — created