Regular expression to match “>”, “<”, “&” chars that appear inside XML nodes

前端 未结 7 2111
栀梦
栀梦 2020-12-19 02:42

I\'m trying to write a regular expression using the PCRE library in PHP.

I need a regex to match only &, > and < cha

7条回答
  •  青春惊慌失措
    2020-12-19 02:47

    I'm reasonably certain it's simply not possible. You need something that keeps track of nesting, and there's no way to get a regular expression to track nesting. Your choices are to fix the text first (when you probably can use an RE) or use something that's at least vaguely like an XML parser, specifically to the extent of keeping track of how the tags are nested.

    There's a reason XML demands that these characters be escaped though -- without that, you can only guess about whether something is really a tag or not. For example, given something like:

        Text containing < and > characters
    

    you and I can probably guess that the result should be: ...containing < and >... but I'm pretty sure the XML specification allows the extra whitespace, so officially "< and >" should be treated as a tag. You could, I suppose, assume that anything that looks like an un-matched tag really isn't intended to be a tag, but that's going to take some work too.

提交回复
热议问题