PHP / RegEx - Convert URLs to links by detecting .com/.net/.org/.edu etc

折月煮酒 提交于 2020-01-19 14:59:16

问题


I know there have been many questions asking for help converting URLs to clickable links in strings, but I haven't found quite what I'm looking for.

I want to be able to match any of the following examples and turn them into clickable links:

http://www.domain.com
https://www.domain.net
http://subdomain.domain.org
www.domain.com/folder
subdomain.domain.net
subdomain.domain.edu/folder/subfolder
domain.net
domain.com/folder

I do not want to match random.stuff.separated.with.periods.

EDIT: Please keep in mind that these URLs need to be found within larger strings of 'normal' text. For example, I want to match 'domain.net' in "Hello! Come check out domain.net!".

I think this could be accomplished with a regex that can determine whether the matching url contains .com, .net, .org, or .edu followed by either a forward slash or whitespace. Other than a user typo, I can't imagine any other case in which a valid URL would have one of those followed by anything else.

I realize there are many valid domain extensions out there, but I don't need to support them all. I can just choose which to support with something like (com|net|org|edu) in the regex. Unfortunately, I'm not skilled enough with regex yet to know how to properly implement this.

I'm hoping someone can help me find a regular expression (for use with PHP's preg_replace) that can match URLs based on just about any text connected by one or more dots and either ending with one of the specified extensions followed by whitespace OR containing one of the specified extensions followed by a slash and possibly folders.

I did several searches and so far have not found what I'm looking for. If there already exists a SO post that answers this, I apologize.

Thanks in advance.

--- EDIT 3 ---

After days of trial and error and some help from SO, here's what works:

preg_replace_callback('#(\s|^)((https?://)?(\w|-)+(\.(\w+|-)*)+(?<=\.net|org|edu|com|cc|br|jp|dk|gs|de)(\:[0-9]+)?(?:/[^\s]*)?)(?=\s|\b)#is',
                create_function('$m', 'if (!preg_match("#^(https?://)#", $m[2]))
                return $m[1]."<a href=\"http://".$m[2]."\">".$m[2]."</a>"; else return $m[1]."<a href=\"".$m[2]."\">".$m[2]."</a>";'),
                $event_desc);

This is a modified version of anubhava's code below and so far seems to do exactly what I want. Thanks!


回答1:


You can use this regex:

#(\s|^)((?:https?://)?\w+(?:\.\w+)+(?<=\.(net|org|edu|com))(?:/[^\s]*|))(?=\s|\b)#is

Code:

$arr = array(
'http://www.domain.com/?foo=bar',
'http://www.that"sallfolks.com',
'This is really cool site: https://www.domain.net/ isn\'t it?',
'http://subdomain.domain.org',
'www.domain.com/folder',
'Hello! You can visit vertigofx.com/mysite/rocks for some awesome pictures, or just go to vertigofx.com by itself',
'subdomain.domain.net',
'subdomain.domain.edu/folder/subfolder',
'Hello! Check out my site at domain.net!',
'welcome.to.computers',
'Hello.Come visit oursite.com!',
'foo.bar',
'domain.com/folder',

);
foreach($arr as $url) {   
   $link = preg_replace_callback('#(\s|^)((?:https?://)?\w+(?:\.\w+)+(?<=\.(net|org|edu|com))(?:/[^\s]*|))(?=\s|\b)#is',
           create_function('$m', 'if (!preg_match("#^(https?://)#", $m[2]))
               return $m[1]."<a href=\"http://".$m[2]."\">".$m[2]."</a>"; else return $m[1]."<a href=\"".$m[2]."\">".$m[2]."</a>";'),
           $url);
   echo $link . "\n";

OUTPUT:

<a href="http://www.domain.com/?foo=bar">http://www.domain.com/?foo=bar</a>
http://www.that"sallfolks.com
This is really cool site: <a href="https://www.domain.net">https://www.domain.net</a>/ isn't it?
<a href="http://subdomain.domain.org">http://subdomain.domain.org</a>
<a href="http://www.domain.com/folder">www.domain.com/folder</a>
Hello! You can visit <a href="http://vertigofx.com/mysite/rocks">vertigofx.com/mysite/rocks</a> for some awesome pictures, or just go to <a href="http://vertigofx.com">vertigofx.com</a> by itself
<a href="http://subdomain.domain.net">subdomain.domain.net</a>
<a href="http://subdomain.domain.edu/folder/subfolder">subdomain.domain.edu/folder/subfolder</a>
Hello! Check out my site at <a href="http://domain.net">domain.net</a>!
welcome.to.computers
Hello.Come visit <a href="http://oursite.com">oursite.com</a>!
foo.bar
<a href="http://domain.com/folder">domain.com/folder</a>

PS: This regex only supports http and https scheme in URL. So eg: if you want to support ftp also then you need to modify the regex a little.




回答2:


'/(http(s)?:\/\/)?[\w\/\.]+(\.((com)|(edu)|(net)|(org)))[\w\/]*/'

That works for your examples. You might want to add extra characters support for "-", "&", "?", ":", etc in the last bracket.

'/(http(s)?:\/\/)?[\w\/\.]+(\.((com)|(edu)|(net)|(org)))[\w\/\?=&-;]*/'

This will support parameters and port numbers.

eg.: www.foo.ca:8888/test?param1=val1&param2=val2




回答3:


Thanks a ton. I modified his final solution to allow all domains (.ca, .co.uk), not just the specified ones.

$html = preg_replace_callback('#(\s|^)((https?://)?(\w|-)+(\.[a-z]{2,3})+(\:[0-9]+)?(?:/[^\s]*)?)(?=\s|\b)#is',
    create_function('$m', 'if (!preg_match("#^(https?://)#", $m[2])) return $m[1]."<a href=\"http://".$m[2]."\" target=\"blank\">".$m[2]."</a>"; else return $m[1]."<a href=\"".$m[2]."\" target=\"blank\">".$m[2]."</a>";'),
    $url);


来源:https://stackoverflow.com/questions/10110505/php-regex-convert-urls-to-links-by-detecting-com-net-org-edu-etc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!