Can a URL contain a semicolon and still be valid?

前端 未结 7 1505
故里飘歌
故里飘歌 2020-11-30 04:47

I am using a regular expression to convert plain text URL to clickable links.

@(https?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.-]*(\\?\\S+)?)?)?)@

Ho

7条回答
  •  北荒
    北荒 (楼主)
    2020-11-30 05:04

    Yes, semicolons are valid in URLs. However, if you're plucking them from relatively unstructured prose, it's probably safe to assume a semicolon at the end of a URL is meant as sentence punctuation. The same goes for other sentence-punctuation characters like periods, question marks, quotes, etc..

    If you're only interested in URLs with an explicit http[s] protocol, and your regex flavor supports lookbehinds, this regex should suffice:

    https?://[\w!#$%&'()*+,./:;=?@\[\]-]+(?

    After the protocol, it simply matches one or more characters that may be valid in a URL, without worrying about structure at all. But then it backs off as many positions as necessary until the final character is not something that might be sentence punctuation.

提交回复
热议问题