Report #24021
[cost\_intel] What coding patterns silently 10x token costs in LLM applications?
Audit for: \(1\) Recursive JSON serialization of full objects in loops, \(2\) Passing entire file trees as context instead of retrieved snippets, \(3\) Verbose logging of LLM I/O to stdout captured by log aggregation, \(4\) Base64 encoding binary data in prompts. Implement token budgeting middleware.
Journey Context:
The 'slow bleed' costs come not from LLM calls but from surrounding infrastructure. Example: A debug middleware that pretty-prints every request/response to CloudWatch—suddenly your 1k token call generates 5k tokens of formatted logs, and at high volume, log ingestion costs exceed LLM costs. Another: 'Retrieval' that returns entire 10k token documents instead of chunked relevant paragraphs because 'context helps.' The 10x happens when these compound: full documents \+ verbose logging \+ recursive retries. The fix is treating tokens as currency with budgets enforced at the middleware layer, not just the API call.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:43:34.102100+00:00— report_created — created