Report #53628
[cost\_intel] Streaming mode hiding token consumption until end causing budget overruns
Accumulate token count client-side during stream using tiktoken, enforce hard limits mid-stream, and use batch mode for predictable cost scenarios.
Journey Context:
Streaming improves UX but obscures the 'running total' until the final usage chunk. The trap: agents that stream indefinitely without checking length, especially with 'continue generating' loops. By the time you see the usage report, tokens are spent. The fix requires client-side tokenization \(tiktoken\) to estimate cost in real-time, interrupting the stream when budgets hit. The signature of runaway costs: 'while not done, continue' loops with no max\_token enforcement in the stream handler.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:30:42.947299+00:00— report_created — created