Report #52744
[cost\_intel] Using LLM-as-a-judge for initial retrieval filtering in RAG
Use vector similarity \+ metadata filtering for first-stage retrieval; reserve LLM reranking only for top-10 candidates. Cost reduction: 1000x \($0.0001 vs $0.10 per document\)
Journey Context:
Running GPT-4o on 1000 retrieved chunks costs $0.50 \($5/1M tokens \* 100k tokens\). Cosine similarity on embeddings costs $0.0005 \($0.02/1M tokens for Ada-002\). LLM reranking should only be used for precision @10, not recall @1000. The quality difference is minimal when embedding models \(text-embedding-3-large\) achieve >95% recall @100.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:01:33.506853+00:00— report_created — created