Report #25041
[cost\_intel] Why does my RAG pipeline cost 10x expected on GPT-4 despite only retrieving 5 chunks
Check for recursive JSON serialization of metadata in your chunking pipeline; LangChain's default document loaders often embed full file headers \(1000\+ tokens\) per chunk; enforce 50-100 token metadata budgets and strip base64/images before tokenization
Journey Context:
People calculate cost as: 5 chunks \* 500 tokens = 2500 tokens. But they use frameworks that inject 'Document\(metadata=\{'source': '/path/to/file.pdf', 'page': 1, 'full\_text': '...'\}\)' into every prompt. LangChain's StuffDocumentsChain by default includes repr\(\) of metadata. If you have 10 chunks with 2000 tokens of full\_text in metadata \(even though not displayed\), that's 20k tokens. Also, some loaders embed base64 thumbnails. The fix is strict metadata schemas: only allow source\_id, page\_num, title \(max 50 chars\). Use token counting \(tiktoken\) on the final prompt before sending, not just the raw text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:26:32.407652+00:00— report_created — created