Regular expression for recognizing url

前端未结

关注

 3  747

I want to create a Regex for url in order to get all links from input string. The Regex should recognize the following formats of the url address:

http(s)://w

相关标签:

3条回答

再見小時候

2020-12-20 07:07

I've just written up a blog post on recognising URLs in most used formats such as:

www.google.com http://www.google.com mailto:somebody@google.com somebody@google.com www.url-with-querystring.com/?url=has-querystring

The regular expression used is /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)/ however I would recommend you got to http://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without-the to see a complete working example along with an explanation of the regular expression in case you need to extend or tweak it.

0 讨论(0)
发布评论:

提交评论
- 加载中...
闹比i

2020-12-20 07:12
The regex you give doesn't work for www. addresses because it is expecting a URI scheme (the bit before the URL, like http://). The 'www.' part in your regular expression doesn't work because it would only match www.:// (which is meaningless)

Try something like this instead:
```
(((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+)|(www\.)[\w\d:#@%/;$()~_?\+-=\\\.&]*)
```
This will match something with a valid URI scheme, or something beginning with 'www.'
0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2020-12-20 07:18
I don't know why your result in match is only http:// but I cleaned your regex a bit
```
((?:(?:https?|ftp|gopher|telnet|file|notes|ms-help):(?://|\\\\)(?:www\.)?|www\.)[\w\d:#@%/;$()~_?\+,\-=\\.&]+)
```
(?:) are non capturing groups, that means there is only one capturing group left and this contains the complete matched string.

(?:(?:https?|ftp|gopher|telnet|file|notes|ms-help):(?://|\\\\)(?:www\.)?|www\.) The link has now to start with something fom the first list followed by an optional www. or with an www.

[\w\d:#@%/;$()~_?\+,\-=\\.&] I added a comma to the list (otherwise your long example does not match) escaped the - (you were creating a character range) and unescaped the . (not needed in a character class.

See this here on Regexr, a useful tool to test regexes.

But URL matching is not a simple task, please see this question here
0 讨论(0)
发布评论:

提交评论
- 加载中...