Report #95092
[tooling] Process a multi-gigabyte JSON file without running out of memory
Use \`jq --stream 'select\(.\[0\] \| contains\(\["key"\]\)\) \| .\[1\]' large.json\` to parse incrementally, emitting \[path, value\] pairs instead of loading the entire document tree.
Journey Context:
Standard \`jq\` loads the entire JSON structure into memory as a tree of JSON objects, causing OOM kills on files larger than available RAM \(common with logs or API dumps\). The \`--stream\` flag switches jq to a SAX-like incremental parser that yields \[path, value\] tuples as the file is read, allowing constant-memory processing. Syntax becomes more verbose: you reconstruct objects by aggregating paths rather than selecting fields directly. Tradeoff: queries are harder to write \(requires understanding path arrays\) and some jq filters \(like \`sort\_by\`\) cannot be used in streaming mode. Essential for ETL pipelines processing NDJSON or large array dumps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:11:28.377836+00:00— report_created — created