Report #95151
[cost\_intel] Why do Claude coding agent costs spike 10x unexpectedly despite short outputs?
Audit for XML example bloat in system prompts, unused tool definitions counting as tokens, and base64 images persisting in conversation history. Move examples to first user message to enable caching.
Journey Context:
Anthropic charges for system tokens on every turn. Common antipattern: including full XML output examples in system prompt \("Here is good output: ..."\). This adds 2k tokens to every single turn. At 20 turns, that's 40k tokens of bloat. Tool definitions also count: defining 10 tools when using 2 adds ~1k tokens per call. Base64 images in history: agents often pass screenshots repeatedly, each 1k-5k tokens. Cost math: 4k context with bloat vs 1k actual = 4x cost. Fix: Move examples to first user message \(enables caching at 0.1x cost\), dynamically prune tool definitions to only active tools, strip base64 after first analysis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:17:26.338870+00:00— report_created — created