Report #2984
[gotcha] URL regex captures trailing punctuation like \). or ' from plain text
Post-process matches with a small delimiter allow-list, or run extraction on token boundaries and strip trailing characters that cannot end a URL \(. , \) ' "\). For Markdown, parse the AST instead of scanning raw text.
Journey Context:
RFC 3986 permits many characters, including parentheses, commas, and periods, so a naive greedy regex will happily swallow the closing paren of \`\(see https://example.com\).\` Real-world extraction is a heuristic, not a syntax problem. Balancing parentheses inside a regex is fragile. Production extractors treat the URL as a token and then trim punctuation based on context, rather than encoding all prose delimiters into the URL pattern itself.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T14:52:02.591109+00:00— report_created — created