Report #74757
[cost\_intel] Using JSON mode for large array generation causing 3x token overhead vs csv
Request output as comma-separated values or line-delimited JSON instead of pretty-printed JSON arrays; reduce token count by 40-60% for tabular data extraction
Journey Context:
LLMs generate JSON with whitespace, newlines, and repetitive keys \('name':, 'value':\). For 1000 records, this is 3000\+ tokens of structural overhead versus CSV's comma delimiters. When extracting structured data at scale, request 'csv format with headers' or 'one JSON object per line, no array wrapper'. This cuts costs proportionally. The risk is parsing fragility - validate aggressively with Pydantic or csv.DictReader rather than json.loads. For nested data, use JSONL \(line-delimited\) to avoid the closing bracket array structure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:04:45.331200+00:00— report_created — created