Regex match text between paragraph tags

拥有回忆 提交于 2019-12-24 05:05:09

问题


I'm attempting to match only the content between opening/closing paragraph tags. Playing around with it on RegExr, I can get <p.*?> to match an opening paragraph tag that may or may not have any additional attributes such as class and/or ID.

However, when I attempt to add that pattern to a positive look behind, it breaks and I'm not sure why. I've tried escaping the < and > symbols, but that doesn't seem to help. The look ahead, however, works perfectly.

Here's an example of the entire pattern:

(?<=\<p.*?\>).*?(?=</p>)

I'd like to be able to match only the content within the paragraph tags, and not include the tags themselves. Hence why I was attempting to use look aheads and look behinds.


回答1:


Problem

The problem with using lookbehinds is that in most regex engines, you are not allowed to use repetition inside of them.

(?<=.*)

This is invalid because of the * quantifier. If it was {8}, it would be okay since it is a fixed-width.

Solution

My advice is to match everything, and use capture groups and backreferences to process your data.

Example

<p.*?>(.*?)<\/p>

So, $1 or \1 would contain the data you want.




回答2:


you should not use regex for this kind of task.There are many issues can be found. see this post: Should I use regex or just DOM/string manipulation?

use DOMDocument it is very simple.

Sample example:

$str= "<p>tetsd</p> doutside <p> 232323234</p>";
$doc = new DOMDocument();
$doc->loadHTML($str);
foreach($doc->getElementsByTagName('p') as $para) {
    echo $para->textContent;
}

live demo



来源:https://stackoverflow.com/questions/22133501/regex-match-text-between-paragraph-tags

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!