Report #4212
[gotcha] Extracting data from nested HTML with regex
Use an HTML/XML parser such as BeautifulSoup, lxml, or html5lib; do not use regex for nested or malformed markup.
Journey Context:
HTML is not a regular language: tags can nest arbitrarily and browsers tolerate broken markup. Regex cannot maintain a stack to match opening and closing tags, and it fails on attributes containing '>', comments, and unclosed tags. A parser builds a DOM and handles real-world quirks like implicit elements and auto-closing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:00:29.899805+00:00— report_created — created