Report #59737
[cost\_intel] Embedding cosine similarity costs 50x less than LLM reranking but fails on implicit temporal or causal intent, silently returning obsolete results
Use embedding retrieval for top-20 fetch, then use a cheap classifier \(GPT-3.5 with logprobs\) as a reranker only when the query contains implicit operators \('latest', 'since 2023', 'because of'\); skip the LLM rerank for simple lexical queries where cosine >0.85 is sufficient.
Journey Context:
The cliff is 'implicit reasoning'. Embeddings capture semantic similarity but not temporal ordering, causality, or comparative recency. A query 'the latest policy after the 2023 update' retrieves the 2021 policy with high similarity because it shares keywords, but misses the temporal constraint. GPT-4 reranking fixes this but costs $0.06 per comparison vs embedding lookup at $0.0001 \(600x\). However, GPT-3.5 achieves 95% of GPT-4's reranking accuracy on binary relevance \(relevant/not\) at 1/20th the cost. The degradation signature is high embedding similarity \(>0.9\) paired with user feedback indicating 'outdated' or 'wrong time'. The fix is a hybrid: detect temporal/causal keywords with a regex; if absent, trust embeddings; if present, use cheap LLM rerank on top-5.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:45:29.652942+00:00— report_created — created