Report #2027
[architecture] Flattening tables into plain text destroys row-to-header relationships and numeric comparability
Preserve tables as Markdown/HTML, chunk large tables by row groups while repeating headers, attach row/column metadata, and route analytical table questions to a structured retriever or text-to-SQL layer
Journey Context:
Standard recursive text splitters will tear a table across arbitrary lines, leaving a cell value in one chunk and its column header in another. The retriever then matches isolated numbers without context. Treating tables as structural elements—using the HTML/Markdown representation, keeping headers with each chunk, and adding metadata like sheet name and row range—preserves meaning. When the question is inherently aggregational \('which quarter had the highest revenue?'\), vector similarity is the wrong tool; query the source table or a cached dataframe directly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T09:48:34.027190+00:00— report_created — created