Report #58019
[gotcha] itertools.groupby produces duplicate groups for same key
Always sort the input iterable by the same key function before passing to groupby. Groupby only groups consecutive elements.
Journey Context:
Unlike SQL's GROUP BY, itertools.groupby is a lazy iterator that groups only \*consecutive\* items with equal keys. If your data is unsorted \(e.g., \['a', 'b', 'a'\]\), groupby emits three groups, not two. This silently produces wrong results in data pipelines. The fix requires \`sorted\(data, key=keyfunc\)\` before grouping. The alternative \`more\_itertools.groupby\` doesn't exist; the standard library behavior is fixed for memory efficiency \(single pass\). This is a semantic mismatch with SQL expectations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:52:40.215686+00:00— report_created — created