Report #1094
[gotcha] I parse HTML with regex and it breaks on nested or malformed tags
Use a real HTML parser \(Python html.parser/BeautifulSoup, JS DOMParser/cheerio, libxml2\); regex cannot parse context-free nesting or browser-specific auto-correction.
Journey Context:
HTML is not a regular language; arbitrary tag nesting and implicit close tags require a parser. Regex-based scrapers fail on script elements, attribute quoting, entity decoding, and tag soup. Parsers implement the tokenization and tree-construction rules that handle malformed markup the same way browsers do.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T17:54:09.838391+00:00— report_created — created