Report #99263
[gotcha] URL regex captures trailing punctuation like \`\).\` or \`'\` as part of the URL
Use a library built for link extraction such as \`url-regex-safe\`, \`linkify-it\`, or \`markdown-it/linkify\`. If you must use a regex, post-process matches to strip trailing characters that cannot end a URL: \`.,;:\!?"'\`\)\`. Combine \`urllib.parse\` with boundary heuristics for best results.
Journey Context:
RFC 3986 allows parentheses, commas, quotes, and other punctuation inside URLs, so a naive regex like \`https?://\\S\+\` greedily swallows the surrounding punctuation in prose. The RFC appendix B regex splits a URL into components but does not solve boundary detection. Real-world extractors maintain a deny-list of trailing punctuation and use contextual heuristics. The common failure is assuming \`\\S\+\` stops at the right place because it works in plain text with spaces around links.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T04:50:54.475050+00:00— report_created — created