RegEx expression to find a href links and add NoFollow to them

后端 未结 3 557
说谎
说谎 2020-12-11 22:41

I am trying to write a RegEx rule to find all a href HTML links on my webpage and add a \'rel=\"nofollow\"\' to them.

However, I have a list of URLs that must be exc

3条回答
  •  轮回少年
    2020-12-11 23:17

    An improvement to James' regex:

    (]*)(href="https?://)((?!(?:(?:www\.)?'.implode('|(?:www\.)?', $follow_list).'))[^"]+)"((?!.*\brel=)[^>]*)(?:[^>]*)>
    

    This regex will matches links NOT in the string array $follow_list. The strings don't need a leading 'www'. :) The advantage is that this regex will preserve other arguments in the tag (like target, style, title...). If a rel argument already exists in the tag, the regex will NOT match, so you can force follows on urls not in $follow_list

    Replace the with:

    $1$2$3"$4 rel="nofollow">
    

    Full example (PHP):

    function dont_follow_links( $html ) {
     // follow these websites only!
     $follow_list = array(
      'google.com',
      'mypage.com',
      'otherpage.com',
     );
     return preg_replace(
      '%(]*)(href="https?://)((?!(?:(?:www\.)?'.implode('|(?:www\.)?', $follow_list).'))[^"]+)"((?!.*\brel=)[^>]*)(?:[^>]*)>%',
      '$1$2$3"$4 rel="nofollow">',
      $html);
    }
    

    If you want to overwrite rel no matter what, I would use a preg_replace_callback approach where in the callback the rel attribute is replaced separately:

    $subject = preg_replace_callback('%(]*href="https?://(?:(?!(?:(?:www\.)?'.implode('|(?:www\.)?', $follow_list).'))[^"]+)"[^>]*)>%', function($m) {
        return preg_replace('%\srel\s*=\s*(["\'])(?:(?!\1).)*\1(\s|$)%', ' ', $m[1]).' rel="nofollow">';
    }, $subject);
    

提交回复
热议问题