问题
I'm attempting to match only the content between opening/closing paragraph tags. Playing around with it on RegExr, I can get <p.*?> to match an opening paragraph tag that may or may not have any additional attributes such as class and/or ID.
However, when I attempt to add that pattern to a positive look behind, it breaks and I'm not sure why. I've tried escaping the < and > symbols, but that doesn't seem to help. The look ahead, however, works perfectly.
Here's an example of the entire pattern:
(?<=\<p.*?\>).*?(?=</p>)
I'd like to be able to match only the content within the paragraph tags, and not include the tags themselves. Hence why I was attempting to use look aheads and look behinds.
回答1:
Problem
The problem with using lookbehinds is that in most regex engines, you are not allowed to use repetition inside of them.
(?<=.*)
This is invalid because of the * quantifier. If it was {8}, it would be okay since it is a fixed-width.
Solution
My advice is to match everything, and use capture groups and backreferences to process your data.
Example
<p.*?>(.*?)<\/p>
So, $1 or \1 would contain the data you want.
回答2:
you should not use regex for this kind of task.There are many issues can be found.
see this post: Should I use regex or just DOM/string manipulation?
use DOMDocument it is very simple.
Sample example:
$str= "<p>tetsd</p> doutside <p> 232323234</p>";
$doc = new DOMDocument();
$doc->loadHTML($str);
foreach($doc->getElementsByTagName('p') as $para) {
echo $para->textContent;
}
live demo
来源:https://stackoverflow.com/questions/22133501/regex-match-text-between-paragraph-tags