Report #3549
[architecture] Flattening tables into plain text breaks column relationships and numeric reasoning
Preserve table structure: parse tables into HTML/Markdown or structured row elements, store them as atomic chunks with schema metadata, and route analytical questions to a text-to-SQL or pandas agent instead of a generic vector retriever.
Journey Context:
Naive text splitting turns rows into disconnected sentences, so the LLM cannot reliably compare cells or compute aggregates. Structured table extraction keeps headers aligned with values. For heavy analytics, a SQL/CSV agent with explicit schema context outperforms semantic search; vector retrieval is better for finding which table is relevant than for computing over it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T17:32:17.599936+00:00— report_created — created