Regular Expression Negative Lookahead/Lookbehind to Exclude HTML from Find-and-Replace

匆匆过客 提交于 2019-12-13 07:06:37

问题


I have a feature on my site where search results have the search query highlighted in results. However, some of the fields that the site searched through has HTML in it. For example, let's say I had a search result consisting of <span>Hello all</span>. If the user searched for the letter a, I want the code to return <span>Hello <mark>a</mark>all</span> instead of the messy <sp<mark>a</mark>n>Hello <mark>a</mark>ll</sp<mark>a</mark>n> that it would return now.

I know that I can use negative lookbehinds and lookaheads in preg_replace() to exclude any instances where the a is between a < and >. But how do I do that? Regular expressions are one of my weaknesses and I can't seem to come up with any that work.

So far, what I've got is this:

$return = preg_replace("/(?<!\<[a-z\s]+?)$match(?!\>[a-z\s]+?)/i", '<mark>'.$match.'</mark>', $result);

But it doesn't seem to work. Any help?


回答1:


It's considered bad practice to use regex to parse a complex language like HTML. With sufficient skill and patience, and an advanced regex engine, it may be possible, but the potential pitfalls are huge and the performance is unlikely to be good.

A better solution is to use a dom parser such as PHP's built-in DOMDocument class.

A good example of this can be found here in the answer to this related SO question.

Hope that helps.




回答2:


If you do want to use regular expressions, a simple negative look-ahead is all that is required (assuming well-formed markup with no < or > within or between the tags)

$return = preg_replace("/$match(?![^<>]*>)/i", '<mark>$0</mark>', $result);

Any special regular expression characters in $match will need to be properly escaped.



来源:https://stackoverflow.com/questions/15526781/regular-expression-negative-lookahead-lookbehind-to-exclude-html-from-find-and-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!