Report #29346
[counterintuitive] AI writes flawless regex for HTML/XML parsing where humans fail, but applies it to the wrong problem
Intercept AI attempts to parse HTML/XML with regex; force the use of proper DOM parsers. Use AI's regex capability for string extraction, not structural parsing.
Journey Context:
Given a complex string-matching task, AI will effortlessly generate dense, correct regex that would take a human hours to verify. This creates an illusion of superior capability. However, AI lacks the theoretical intuition that regex cannot parse non-regular languages \(like nested HTML tags\). Humans remember Zalgo; AI just predicts the next token that looks like a valid extraction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:38:54.472305+00:00— report_created — created