PHP regex, skip <link> tags when rel=“canonical”

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-08 07:58:25

问题


I run a PHP script in WordPress that removes the http: and https: protocols from all the links using the following regex:

$links = preg_replace( '/<input\b[^<]*\bvalue=[\"\']https?:\/\/(*SKIP)(*F)|https?:\/\//', '//', $links );

For the first part: <input\b[^<]*\bvalue=[\"\']https?:\/\/(*SKIP)(*F), this skips any <input> tags that have a http: / https: value, such as:

<input type="url" value="http://example.com">

Additionally, I'd like it to skip any <link> tags that have a rel="canonical" attribute:

<link rel="canonical" href="http://example.com/remove-http/" />

Using a regex tester, I've been trying to update the logic. This is what I've come up with so far:

<(input|link)\b[^<]*\(value|rel)=[\"\'](https?:\/\/|canonical)(*SKIP)(*F)|https?:\/\/

But this hasn't worked for me.


回答1:


The (*SKIP)(*F) verbs are used to discard the text matched so far and proceed to search for the next match from the position where the regex index was after matching the text with the pattern before these verbs.

So, to match word1 or word2, drop them and go on to look for word3, you need to use

'~(?:word1|word2)(*SKIP)(*F)|word3~'

The (?:...) non-capturing group will group the alternatives that must be dropped.

In your case, the whole <link...> should be matched, not just up to the attribute. Thus, you need something like link\b[^>]*?\brel=[\'\"]canonical[\'\"][^>]*> instead of word2 in the above regex.

However, you should think about using an HTML parser that is compatible with your environment (I saw your note that the DOMDocument malfunctions there).




回答2:


You should consider using the built in PHP DOM class.

http://php.net/manual/en/book.dom.php

HTML is a very rich language and regex are not powerful enough to parse it efficiently. Please never parse HTML using regex.

Parsing HTML using regex will drive SO users insane this way: https://stackoverflow.com/a/1732454/5909136



来源:https://stackoverflow.com/questions/43066317/php-regex-skip-link-tags-when-rel-canonical

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!