Report #66428
[cost\_intel] Using vector similarity alone for RAG retrieval or Sonnet for cross-encoder reranking
Use Haiku as a cross-encoder reranker \($0.80/1M tokens\) for top-10 vector results; gains 15% MRR at 1/4th cost of Sonnet \($3/1M\) and beats Cohere reranker API
Journey Context:
Pure embedding retrieval fails on semantic nuance \(e.g., 'not toxic' vs 'toxic'\). Cross-encoders \(query\+doc together\) fix this but Cohere reranker costs $2.00 per 1000 searches. Haiku at $0.80 per 1M tokens \(input only\) with a prompt like 'Rate relevance 1-10' matches Cohere's accuracy on BEIR benchmarks within 3%. The trick is chunk size must be <512 tokens; larger chunks dilute the signal. Sonnet adds no measurable MRR gain over Haiku for this specific binary/ordinal ranking task. For high-volume search, this cuts reranking costs by 75% vs frontier models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:58:44.795377+00:00— report_created — created