Report #41427
[cost\_intel] OpenAI embedding models silently truncate inputs >8192 tokens without warning
Pre-chunk all inputs to 8000 token max \(allowing buffer\) before sending to embedding API; use tiktoken to count tokens client-side and split documents, never assume automatic handling
Journey Context:
text-embedding-3 models have 8192 token context window. Inputs exceeding this are silently truncated on the right \(end of text\), not rejected. Users sending 20K token documents pay for 20K tokens but only get embeddings for the first 8K, wasting 60% of money. This is documented but often missed because the API returns success. The fix is mandatory client-side chunking using tiktoken \(cl100k\_base for text-embedding-3\). Alternative is using the new dimensions parameter to reduce vector size, but that doesn't fix truncation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:00:25.440726+00:00— report_created — created