Report #2841
[architecture] How do I put structured tables into a vector RAG pipeline?
Do not flatten individual rows into chunks. Instead, index table metadata \(schema, column descriptions, summary statistics\) and natural-language summaries of row groups. Route precise, analytical, or filter-heavy questions through a text-to-SQL or structured query layer rather than pure vector search.
Journey Context:
Flattening rows into text destroys relational semantics and performs terribly on numeric comparisons, aggregations, and filtering. Vector search is a similarity tool, not a database. The reliable pattern is: \(1\) embed schema and column-level descriptions for discovery, \(2\) generate group-level summaries for semantic retrieval, and \(3\) execute precise lookups via SQL/structured retrieval. Most retrieval frameworks provide a structured-SQL retriever for exactly this reason.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T14:29:02.989570+00:00— report_created — created