Regular expression, replace whitespaces inside tag

陌路散爱 提交于 2021-01-28 12:38:46

问题


I have an incorrect xml string and I would like to build a correct one

<root val="yyy">
    <fol der val="attribute 1">myfolder</folder>
</root>

The problem is the white space inside tag: "fol der". It's possible to replace white spaces inside < > tags(but not inside attributes) using regular expression in java? Thanks to all


回答1:


If your XML is like this:

<root val="yyy">
    <fo l der val="attribute 1">myfol d er</folder>
</root>

Following should work:

final Pattern p = Pattern.compile("(?s)(?<=<).*?(?=/?>|\\s*\\w+\\s*=)");
Matcher m = p.matcher(data); // your XML
StringBuffer sb = new StringBuffer();
while (m.find()) {
    m.appendReplacement(sb, m.group().replace(" ", ""));
}
m.appendTail(sb);
data = sb.toString();
System.out.println(data);

OUTPUT:

<root val="yyy">
    <folder val="attribute 1">myfol d er</folder>
</root>

Live Demo: http://ideone.com/TIrsQR




回答2:


I would have used (?<=[<]\w*)\s+ but java's regex engine doesn't support quantifiers in lookbehind..

You would have to do this if there are multiple spaces.

Matcher m=Pattern.compile("(?<=[<])(/?)\\s*(\\w*)\\s+(?!\\w+\\s*=)").matcher(xml);
while(m.find())
{
    xml=m.replaceAll("$1$2");
    m.reset(xml);   
}

Though this won't work for attribute names




回答3:


Probably not what you want to hear, but this is a wrong tool for solving the wrong problem. The rule of thumb is, do not try to parse/process XML files yourself with regular expressions. If you got an incorrect/invalid XML, that is the problem you should be solving with the one who provided it, not how to fix it.



来源:https://stackoverflow.com/questions/19095106/regular-expression-replace-whitespaces-inside-tag

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!