Report #35679
[cost\_intel] Which prompt patterns silently 10x token costs in production LLM pipelines
Never include full conversation history in every request; implement summarization windows at 4k token thresholds. JSON mode with whitespace formatting adds 30% overhead vs compact JSON. Base64 encoding images in text prompts \(common in debug logs\) costs 33% more tokens than binary API uploads. Most expensive pattern: passing entire document sets for 'context' without chunking—easily 100x bloat vs RAG retrieval.
Journey Context:
Engineers think 'tokens are cheap' until the bill arrives. The silent killer is schema bloat: pretty-printing JSON for 'readability' in logs means every request carries 500 tokens of whitespace. Another trap: multi-turn agents that accumulate full history. By turn 20, you're paying for 15k tokens of context to generate a 50-token answer. The 10x threshold is crossed easily when teams skip prompt compression or use verbose XML tagging 'for structure'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:22:00.556784+00:00— report_created — created