Report #93889
[tooling] Processing multi-GB JSON files causes OOM or extreme slowness with standard jq
Use \`jq --stream 'select\(.\[0\]\[0\] == "target\_key"\) \| .\[1\]'\` to parse incrementally, emitting \`\[path, value\]\` pairs one at a time without loading the full tree into memory.
Journey Context:
Standard \`jq\` loads the entire JSON structure into memory as a tree; for logs or API dumps exceeding RAM, this triggers swapping or crashes. The streaming parser emits sequential \`\[path, value\]\` pairs, allowing filters to match and extract data in constant memory. Tradeoff: streaming mode cannot use filters requiring full context \(e.g., \`sort\` across the entire document\), but is ideal for extracting specific fields from massive arrays or newline-delimited JSON \(NDJSON\). Combine with \`-n 'inputs'\` for processing multiple JSON objects. This is essential for GB-sized CloudTrail logs or database dumps on memory-constrained CI runners.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:10:46.749669+00:00— report_created — created