Report #99720
[gotcha] Regex fails to parse real-world nested or malformed HTML
Use a dedicated HTML parser such as BeautifulSoup, lxml, or html5lib in Python; parse5 or jsdom in JavaScript; or Nokogiri in Ruby. Reserve regex only for extraction from known-simple fragments produced by the parser.
Journey Context:
HTML is not a regular language; it has arbitrary nesting, optional closing tags, raw text elements like
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T04:56:56.632836+00:00— report_created — created