问题
Im trying to craft a regex that only returns <link>
tag hrefs
Why does this regex return all hrefs including <a hrefs?
(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+
<link rel="stylesheet" rev="stylesheet" href="idlecore-tidied.css?T_2_5_0_228" media="screen"> <a href="anotherurl">Slash Boxes</a>
thank you
回答1:
Either
/(?<=<link\b[^<>]*?)\bhref=\s*=\s*(?:"[^"]*"|'[^']'|\S+)/
or
/<link\b[^<>]*?\b(href=\s*=\s*(?:"[^"]*"|'[^']'|\S+))/
The main difference is [^<>]*?
instead of .*?
. This is because you don't want it to continue the search into other tags.
回答2:
Avoid lookbehind for such simple case, just match what you need, and capture what you want to get.
I got good results with <link\s+[^>]*(href\s*=\s*(['"]).*?\2)
in The Regex Coach with s and g options.
回答3:
/(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+[^>]*>/
i'm a little shaky on the back-references myself, so I left that in there. This regex though:
/(<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+[^>]*>/
...works in my Javascript test.
回答4:
(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+
works with Expresso (I think Expresso runs on the .NET regex-engine). You could even refine this a bit more to match the closing '
or
"
:
(?<=<link\s+.*?)href\s*=\s*([\'\"])[^\'\"]+(\1)
Perhaps your regex-engine doesn't work with lookbehind assertions. A workaround would be
(?:<link\s+.*?)(href\s*=\s*([\'\"])[^\'\"]+(\2))
Your match will then be in the captured group 1.
回答5:
What regex flavor are you using? Perl, for one, doesn't support variable-length lookbehind. Where that's an option, I'd choose (edited to implement the very good idea from MizardX):
(?<=<link\b[^<>]*?)href\s*=\s*(['"])(?:(?!\1).)+\1
as a first approximation. That way the choice of quote character (' or ") will be matched. The same for a language without support for (variable-length) lookbehind:
(?:<link\b[^<>]*?)(href\s*=\s*(['"])(?:(?!\2).)+\2)
\1 will contain your match.
来源:https://stackoverflow.com/questions/268338/regex-to-return-href-attribute-of-link-tags-only