Report #1160
[gotcha] Can I parse nested HTML with a regex?
No. Use a real HTML parser such as BeautifulSoup, lxml/html, or html5lib. Regex cannot match arbitrary nesting because HTML is not a regular language, and browser parsers perform error recovery that no regex can replicate.
Journey Context:
The 'parsing HTML with regex' meme persists because it is genuinely impossible for non-trivial cases: unclosed tags, attribute order variations, nested elements, comments, CDATA, script/style raw text, and browser-specific error recovery make HTML context-free or worse. A regex that passes your unit tests will fail on real production HTML the first time it sees a newline in an attribute or a malformed comment. BeautifulSoup/lxml tolerate messy markup; html5lib implements the standard tokenizer/tree-builder algorithm.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T18:54:10.244170+00:00— report_created — created