Removing anything between XML tags and their content

后端 未结 5 1892
无人及你
无人及你 2021-01-07 04:40

I would need to remove anything between XML tags, especially whitespace and newlines.

For example removing whitespace and newslines from:
\\n<

5条回答
  •  粉色の甜心
    2021-01-07 05:03

    It is generally not a good idea to parse XML using regular expressions. One of the major benefits of XML is that there are dozens of well-tested parsers out there for any language/framework that you might ever want. There are some tricky rules within XML that prevent any regular expression from being able to properly parse XML.

    That said, something like:

    s/>.*?

    (that is perl syntax) might do what you want. That says take anything from a greater than up to a less than, and strip it away. The "g" at the end says to perform the substitution as many times as needed, and the "s" makes the "." match all characters INCLUDING newlines (otherwise newlines would not be included, so the pattern would need to be run once for each line, and it would not cover tags that span multiple lines).

提交回复
热议问题