I am trying to write a RegEx rule to find all a href HTML links on my webpage and add a \'rel=\"nofollow\"\' to them.
However, I have a list of URLs that must be exc
(
would match the first part of any link that starts with http:// or https:// and doesn't contain pokerdiy.com or www.example.com/link.aspx anywhere in the href attribute. Replace that by
\1\2" rel="nofollow"
If a rel="nofollow" is already present, you'll end up with two of these. And of course, relative links or other protocols like ftp:// etc. won't be matched at all.
Explanation:
(?!\b(foo|bar)\b)[^"] matches any non-" character unless it it possible to match foo or bar at the current location. The \bs are there to make sure we don't accidentally trigger on rebar or foonly.
This whole contruct is repeated ((?: ... )+), and whatever is matched is preserved in backreference \2.
Since the next token to be matched is a ", the entire regex fails if the attribute contains foo or bar anywhere.