Report #94468
[counterintuitive] general embeddings work for all domains
Evaluate and potentially fine-tune embedding models on domain-specific data; out-of-the-box embeddings often fail on specialized vocabulary \(medical, legal, code\).
Journey Context:
General embedding models are trained on web text. When you embed highly specialized text, the representations cluster poorly, leading to terrible retrieval in RAG. Cosine similarity does not equal semantic relevance in out-of-distribution domains.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:09:01.320075+00:00— report_created — created