Report #39285
[research] Sub-agents fail silently or return truncated outputs when their context window overflows during long tool-use chains
Add telemetry to track tokens\_used / context\_window\_limit per agent step. Set a hard eval assertion that fails any trace where a sub-agent exceeds 85% context utilization without explicit summarization.
Journey Context:
Agents often handle API errors gracefully but fail silently on context limits, returning weirdly truncated text that the parent agent misinterprets as a valid response. This causes silent data loss. Observability must track token counts per span. If a sub-agent hits the limit, it should be treated as a critical failure in evals, prompting the insertion of a summarization step in the agent's logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:24:38.956721+00:00— report_created — created