Why is Apache Xerces/Xalan adding additional carriage returns to my serialized output?

陌路散爱 提交于 2020-01-13 09:01:53

问题


I'm using Apache Xerces 2.11.0 and Apache Xalan 2.7.1 and I'm having problems with additional carriage return characters in the serialized XML.

I have this (pseudo) code:

String myString = ...;
Document doc = ...;

Element item = doc.createElement("item");
item.appendChild(doc.createCDATASection(myString));

Transformer transformer = ...;
ByteArrayOutputStream stream = new ByteArrayOutputStream();
Result result = new StreamResult(stream);
transformer.transform(new DOMSource(document), result);

Now myString contains line breaks (\r\n), (actually it's base64 encoded data) but when I look at the serialized output, there are additional \r characters.

Input:

Line 1 \r\n
Line 2 \r\n
Line 3 \r\n

Output:

Line 1 \r\r\n
Line 2 \r\r\n
Line 3 \r\r\n

If I use createTextNode instead of createCDATASection the output becomes even more interesting:

Line 1 
\r\n
Line 2 
\r\n
Line 3 
\r\n

The additional character seems to be introduced during serialization, the DOM tree seems to be correct. (According to getTextContent())

Why is this happening? What can I do to fix this?


回答1:


I guess your are having this problem on Windows and not on Linux/Solaris/Mac. Xalan serializer (org.apache.xml.serializer.ToStream.java) gets the line separator using System.getProperty("line.separator"). When the serializer writes \r\n, it interprets the \n as the end of line sequence and it actually writes \r+lineSeparator = \r\r\n. Although this sounds strange, this is not a bug, see [1]. But since this was frequently reported as a bug, a xalan extension property was added [2]. So you may programmatically set:

transformer.setOutputProperty("{http://xml.apache.org/xalan}line-separator","\n");

or

<xsl:output xalan:line-separator="&#10;" />

where xalan is a prefix associated with the URL "http://xml.apache.org/xalan".

[1] https://issues.apache.org/jira/browse/XALANJ-1660

[2] https://issues.apache.org/jira/browse/XALANJ-2093




回答2:


Odd, but try doing transformer.setOutputProperty(javax.xml.transform.OutputKeys.INDENT, "no"); immediately after creating the transformer and see what happens.




回答3:


Try using Xerces 2.9.0 which is tested with Xalan 2.7.1. (2.9.0 comes within the Xalan package)

After I had problems with Xerces 2.11.0 I did the same.



来源:https://stackoverflow.com/questions/6317273/why-is-apache-xerces-xalan-adding-additional-carriage-returns-to-my-serialized-o

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!