Regular expression for recognizing url

前端 未结 3 742
你的背包
你的背包 2020-12-20 06:56

I want to create a Regex for url in order to get all links from input string. The Regex should recognize the following formats of the url address:

  • http(s)://w
相关标签:
3条回答
  • 2020-12-20 07:07

    I've just written up a blog post on recognising URLs in most used formats such as:

    www.google.com http://www.google.com mailto:somebody@google.com somebody@google.com www.url-with-querystring.com/?url=has-querystring

    The regular expression used is /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)/ however I would recommend you got to http://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without-the to see a complete working example along with an explanation of the regular expression in case you need to extend or tweak it.

    0 讨论(0)
  • 2020-12-20 07:12

    The regex you give doesn't work for www. addresses because it is expecting a URI scheme (the bit before the URL, like http://). The 'www.' part in your regular expression doesn't work because it would only match www.:// (which is meaningless)

    Try something like this instead:

    (((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+)|(www\.)[\w\d:#@%/;$()~_?\+-=\\\.&]*)
    

    This will match something with a valid URI scheme, or something beginning with 'www.'

    0 讨论(0)
  • 2020-12-20 07:18

    I don't know why your result in match is only http:// but I cleaned your regex a bit

    ((?:(?:https?|ftp|gopher|telnet|file|notes|ms-help):(?://|\\\\)(?:www\.)?|www\.)[\w\d:#@%/;$()~_?\+,\-=\\.&]+)
    

    (?:) are non capturing groups, that means there is only one capturing group left and this contains the complete matched string.

    (?:(?:https?|ftp|gopher|telnet|file|notes|ms-help):(?://|\\\\)(?:www\.)?|www\.) The link has now to start with something fom the first list followed by an optional www. or with an www.

    [\w\d:#@%/;$()~_?\+,\-=\\.&] I added a comma to the list (otherwise your long example does not match) escaped the - (you were creating a character range) and unescaped the . (not needed in a character class.

    See this here on Regexr, a useful tool to test regexes.

    But URL matching is not a simple task, please see this question here

    0 讨论(0)
提交回复
热议问题