Report #12962
[tooling] llama.cpp server returning wrong embedding dimensions or NaN values with nomic-embed-text
Launch the server with explicit \`--embedding --pooling mean\` flags \(use \`cls\` for BERT-style models\). For nomic-embed-text-v1.5 specifically, ensure the GGUF was built with the correct config or pass \`--rope-freq-base 10000\` if using legacy conversions. Verify via the \`/embedding\` endpoint with \`\{"input": "test", "encoding\_format": "float"\}\` and check for 768-dim output without NaNs.
Journey Context:
Default llama.cpp server pooling mode often defaults to \`none\` or \`cls\` depending on build flags, causing nomic models \(which expect mean pooling\) to output 768-dimensional vectors of NaNs or garbage. Users often think the GGUF is corrupted. The \`--pooling\` flag was added specifically to handle the diversity of embedding architectures \(nomic, e5, bge, BERT\). Additionally, nomic-embed-v1.5 uses Matryoshka representation learning; you must truncate the output vector client-side to your target dimension \(e.g., 512 or 256\) or use specific GGUFs built for those dims. Critical: never use the \`/completion\` endpoint for embeddings; it ignores pooling config and uses the causal LM head, producing nonsensical vectors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T17:23:04.248423+00:00— report_created — created