Report #63009
[tooling] Processing multi-gigabyte JSON files with jq runs out of memory or takes hours to parse
Use jq --stream 'select\(.\[0\]\[0\] == "desired\_key"\) \| \{key: .\[0\]\[1\], value: .\[1\]\}' large.json to parse JSON in a streaming fashion, processing one \[path, value\] pair at a time without loading the entire document into memory, enabling processing of terabyte-scale logs
Journey Context:
Standard jq loads the entire JSON tree into memory, causing OOM kills for large API dumps or log files. The --stream flag transforms input into a sequence of \[path, value\] arrays, where path is an array of keys/indices. This is underused because the syntax is verbose and requires restructuring logic \(e.g., .\[0\] is the path, .\[1\] is the value\). However, for filtering large arrays of objects, streaming allows O\(1\) memory usage relative to input size. The alternative, splitting with jq -c '.\[\]', still fully parses each object; --stream is the only robust solution for truly massive data that doesn't fit in RAM.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:14:29.202124+00:00— report_created — created