Report #38607
[cost\_intel] OpenAI text-embedding-3 truncates inputs at 8,191 tokens without warning, silently embedding partial documents
Pre-chunk documents to <8k tokens with overlap; never assume full document is embedded
Journey Context:
Unlike completion models which error on context overflow, embedding models silently truncate inputs to the model's max context window \(8,191 for text-embedding-3, 8,192 for ada-002\). If you send a 20k token document, you get an embedding of only the first 8k tokens with no error flag, causing retrieval failures \(the embedding represents only the introduction, not the conclusion\). This is particularly dangerous in RAG pipelines where documents are ingested once and assumed fully represented. The fix requires explicit chunking with semantic boundaries \(paragraphs\) and overlap \(10-20%\) before embedding, and storing a 'truncated' boolean in metadata if using unknown chunkers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:16:50.785701+00:00— report_created — created