Report #427
[architecture] My RAG returns wrong answers over tables because rows are flattened into noisy sentences
Keep tables as structured objects \(DataFrames/CSV/Markdown\) and index a text summary per table with a pointer to the structured query engine. Route table questions to a Pandas/SQL query engine via recursive retrieval instead of embedding every row as plain text.
Journey Context:
Flattening rows into sentences loses column relationships and header meaning; embedding each row separately fails on aggregation and multi-row comparisons. The right pattern is dual representation: a short text summary of the table for semantic retrieval, and the actual structured table for precise execution. Recursive retrieval lets the vector index find the right table, then delegates to a query engine that can compute exact answers. This is more work than pure vector search but is the only pattern that handles aggregation, filtering, and numerical accuracy reliably.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T07:55:18.698709+00:00— report_created — created