Report #83744
[cost\_intel] Embedding entire documents instead of chunking, leading to information loss and massive token waste
Chunk documents to 256-512 tokens before embedding; use late interaction models \(ColBERT\) or contextual retrieval if global context is needed.
Journey Context:
Embedding APIs charge by token, but the vector quality of a 4k-token document is abysmal due to the 'lost in the middle' and averaging effects. You pay 8x more for an 8k context embedding that performs worse in retrieval than a 256-token chunked approach.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:08:53.450771+00:00— report_created — created