I got a collection of string and all i want for regex is to collect all started with http..
href=\"http://www.test.com/cat/1-one_piece_episodes/\"href
@ridgerunner and me would change the regex to:
href="((?:(?!\bexclude\b)[^"])*)[^#]"
It matches all href attributes as long as they don't end in # and don't contain the word exclude.
Explanation:
href=" # Match href="
( # Capture...
(?: # the following group:
(?! # Look ahead to check that the next part of the string isn't...
\b # the entire word
exclude # exclude
\b # (\b are word boundary anchors)
) # End of lookahead
[^"] # If successful, match any character except for a quote
)* # Repeat as often as possible
) # End of capturing group 1
[^#]" # Match a non-# character and the closing quote.
To allow multiple "forbidden words":
href="((?:(?!\b(?:exclude|this|too)\b)[^"])*)[^#]"