问题
I have an incorrect xml string and I would like to build a correct one
<root val="yyy">
<fol der val="attribute 1">myfolder</folder>
</root>
The problem is the white space inside tag: "fol der". It's possible to replace white spaces inside < > tags(but not inside attributes) using regular expression in java? Thanks to all
回答1:
If your XML is like this:
<root val="yyy">
<fo l der val="attribute 1">myfol d er</folder>
</root>
Following should work:
final Pattern p = Pattern.compile("(?s)(?<=<).*?(?=/?>|\\s*\\w+\\s*=)");
Matcher m = p.matcher(data); // your XML
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group().replace(" ", ""));
}
m.appendTail(sb);
data = sb.toString();
System.out.println(data);
OUTPUT:
<root val="yyy">
<folder val="attribute 1">myfol d er</folder>
</root>
Live Demo: http://ideone.com/TIrsQR
回答2:
I would have used (?<=[<]\w*)\s+ but java's regex engine doesn't support quantifiers in lookbehind..
You would have to do this if there are multiple spaces.
Matcher m=Pattern.compile("(?<=[<])(/?)\\s*(\\w*)\\s+(?!\\w+\\s*=)").matcher(xml);
while(m.find())
{
xml=m.replaceAll("$1$2");
m.reset(xml);
}
Though this won't work for attribute names
回答3:
Probably not what you want to hear, but this is a wrong tool for solving the wrong problem. The rule of thumb is, do not try to parse/process XML files yourself with regular expressions. If you got an incorrect/invalid XML, that is the problem you should be solving with the one who provided it, not how to fix it.
来源:https://stackoverflow.com/questions/19095106/regular-expression-replace-whitespaces-inside-tag