Report #97295
[gotcha] CSV extraction regex breaks on quoted fields containing commas or newlines
Use a real CSV parser \(Python csv module, Papa Parse, csv crate, uniVocity\); if you must hand-roll, implement the RFC 4180 state machine, not a regex.
Journey Context:
CSV looks simple until you meet \`"New York, NY",42,"Line 1\\nLine 2"\` or escaped quotes like \`"He said ""hello"""\`. A regex like \`\(\[^,\]\+\),?\` splits in the middle of quoted fields and cannot handle embedded newlines because \`.\` typically does not match newlines. RFC 4180 specifies that fields may be quoted, quotes are escaped by doubling, and records are separated by CRLF. Regex-based splitters corrupt data silently; the only safe approach is a parser that tracks whether the cursor is inside a quoted field.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T04:52:45.241598+00:00— report_created — created