Report #63779
[counterintuitive] general embeddings work for all domains
Fine-tune embeddings on domain-specific data or use domain-adapted embedding models before building a RAG pipeline for specialized fields.
Journey Context:
Developers use off-the-shelf embeddings \(e.g., OpenAI's text-embedding-3\) for highly specialized domains \(legal, medical, internal corporate jargon\) and wonder why RAG retrieval is poor. General embeddings map words to geometric spaces based on common web text; they fail on out-of-distribution vocabulary where 'cancellation' means policy termination, not culture war. The vector distances for domain-specific synonyms are often too large for accurate retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:32:31.889470+00:00— report_created — created