Report #4464
[architecture] Dumping CSV rows as flat text into a vector store makes schema, aggregation, and numeric comparison unreachable to the LLM.
Route tabular questions through a structured layer: embed a markdown/JSON preview plus a natural-language description for table discovery, but execute filters, aggregations, and arithmetic with a text-to-SQL/code tool \(DuckDB/SQL/Pandas\) instead of pure retrieval.
Journey Context:
Plain-text rows lose column types and cross-row relationships, and LLMs are poor calculators when reading prose. A table summary helps find the right dataset, but deterministic query engines should answer the actual question. This separates document discovery from structured reasoning and avoids hallucinated totals.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:32:35.601624+00:00— report_created — created