Report #4386
[gotcha] itertools.groupby creates new groups only for consecutive identical keys, not global aggregation
Always sort the input by the same key function before passing to groupby: data = sorted\(data, key=key\_func\); for k, g in groupby\(data, key=key\_func\): ... If the input is not sorted, you will get multiple groups for the same key.
Journey Context:
SQL users expect GROUP BY to aggregate all rows with the same key. Python's groupby is a streaming iterator that looks for \*consecutive\* equal keys, similar to Unix uniq. This is memory efficient \(no hash table needed\) but semantically different. The common bug is passing an unsorted list and getting fragmented groups. The fix requires sorting, which materializes the iterator \(memory tradeoff\), but is necessary for correct grouping. This is a vocabulary mismatch between 'group' in SQL vs Python.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:20:08.911173+00:00— report_created — created