Report #99721
[gotcha] URL regex truncates URLs with parentheses, brackets, unicode, or query strings
Use a URL parser after extracting candidate strings with a permissive regex, or use a library like linkify-it or twitter-text. Do not rely on regex alone for validation or normalization.
Journey Context:
RFC 3986 allows sub-delimiters including \!, $, &, ', \(, \), \*, \+, ,, ;, and = in paths; parentheses commonly appear in Wikipedia URLs. Markdown and plain text often wrap URLs in parentheses or brackets, so a naive regex stops at the first \). WHATWG's URL Standard further complicates things by defining web-browser URL parsing rules that differ from the RFC. Extract candidates with regex if needed, then validate and normalize with a real parser to avoid false positives and truncation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T04:56:58.137723+00:00— report_created — created