RegEx to return 'href' attribute of 'link' tags only?

大兔子大兔子 提交于 2019-11-28 12:44:49

Either

/(?<=<link\b[^<>]*?)\bhref=\s*=\s*(?:"[^"]*"|'[^']'|\S+)/

or

/<link\b[^<>]*?\b(href=\s*=\s*(?:"[^"]*"|'[^']'|\S+))/

The main difference is [^<>]*? instead of .*?. This is because you don't want it to continue the search into other tags.

Avoid lookbehind for such simple case, just match what you need, and capture what you want to get.

I got good results with <link\s+[^>]*(href\s*=\s*(['"]).*?\2) in The Regex Coach with s and g options.

/(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+[^>]*>/

i'm a little shaky on the back-references myself, so I left that in there. This regex though:

/(<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+[^>]*>/

...works in my Javascript test.

(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+

works with Expresso (I think Expresso runs on the .NET regex-engine). You could even refine this a bit more to match the closing ' or ":

(?<=<link\s+.*?)href\s*=\s*([\'\"])[^\'\"]+(\1)

Perhaps your regex-engine doesn't work with lookbehind assertions. A workaround would be

(?:<link\s+.*?)(href\s*=\s*([\'\"])[^\'\"]+(\2))

Your match will then be in the captured group 1.

What regex flavor are you using? Perl, for one, doesn't support variable-length lookbehind. Where that's an option, I'd choose (edited to implement the very good idea from MizardX):

(?<=<link\b[^<>]*?)href\s*=\s*(['"])(?:(?!\1).)+\1

as a first approximation. That way the choice of quote character (' or ") will be matched. The same for a language without support for (variable-length) lookbehind:

(?:<link\b[^<>]*?)(href\s*=\s*(['"])(?:(?!\2).)+\2)

\1 will contain your match.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!