transformer.setOutputProperty(OutputKeys.ENCODING, “UTF-8”) is NOT working

拈花ヽ惹草 提交于 2019-11-28 10:57:20
Vyrx

I had the same problem on Android when serializing emoji characters. When using UTF-8 encoding in the transformer the output was HTML character entities (UTF-16 surrogate pairs), which would subsequently break other parsers that read the data.

This is how I ended up solving it:

StringWriter sw = new StringWriter();
sw.write("<?xml version=\"1.0\" encoding=\"UTF-8\" ?>");
Transformer t = TransformerFactory.newInstance().newTransformer();

// this will work because we are creating a Java string, not writing to an output
t.setOutputProperty(OutputKeys.ENCODING, "UTF-16"); 
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.transform(new DOMSource(elementNode), new StreamResult(sw));

return IOUtils.toInputStream(sw.toString(), Charset.forName("UTF-8"));

To answer the question following code works for me. This can take input encoding and convert the data into output encoding.

        ByteArrayInputStream inStreamXMLElement = new ByteArrayInputStream(strXMLElement.getBytes(input_encoding));
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder(); 
        Document docRepeat = db.parse(new InputSource(new InputStreamReader(inStreamXMLElement, input_encoding)));
        Node elementNode = docRepeat.getElementsByTagName(strRepeat).item(0);

        TransformerFactory tFactory = null;
        Transformer transformer = null;
        DOMSource domSourceRepeat = new DOMSource(elementNode);
        tFactory = TransformerFactory.newInstance();
        transformer = tFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        transformer.setOutputProperty(OutputKeys.ENCODING, output_encoding);

        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        StreamResult sr = new StreamResult(new OutputStreamWriter(bos, output_encoding));


        transformer.transform(domSourceRepeat, sr);
        byte[] outputBytes = bos.toByteArray();
        strRepeatString = new String(outputBytes, output_encoding);

I've spent significant amount of time debugging this issue because it was working well on my machine (Ubuntu 14 + Java 1.8.0_45) but wasn't working properly in production (Alpine Linux + Java 1.7).

Contrary to my expectation following from above mentioned answer didn't help.

ByteArrayOutputStream bos = new ByteArrayOutputStream();
StreamResult sr = new StreamResult(new OutputStreamWriter(bos, "UTF-8"));

but this one worked as expected

val out = new StringWriter()
val result = new StreamResult(out)
Kintan K

what about?:

public static String documentToString(Document doc) throws Exception{ return(documentToString(doc,"UTF-8")); }//
   public static String documentToString(Document doc, String encoding) throws Exception{
     TransformerFactory transformerFactory =TransformerFactory.newInstance();
     Transformer transformer = null;

if ( "".equals(validateNullString(encoding) ) ) encoding = "UTF-8";
try{
    transformer = transformerFactory.newTransformer();
    transformer.setOutputProperty(OutputKeys.INDENT, "yes") ;
    transformer.setOutputProperty(OutputKeys.ENCODING, encoding) ;
}catch (javax.xml.transform.TransformerConfigurationException error){
    return null;
}

Source source = new DOMSource(doc);    
StringWriter writer = new StringWriter();
Result result = new StreamResult(writer);

try{
    transformer.transform(source,result);
}catch (javax.xml.transform.TransformerException error){
    return null;
}
return writer.toString();    
}//documentToString

I could work around the problem by wrapping the Document object passed to the DOMSource constructor. The method getXmlEncoding of my wrapper always returns null, all other methods are delegated to the wrapped Document object.

I'm taking a wild shot here, but you mention that you are reading files for the data of the tests. Can you make sure that you that you read the files using the proper encoding so when you write into your OutputStream you already have the data in the proper encoding?

So having something like new InputStreamReader(new FileInputStream(fileDir), "UTF8").

Don't forget that single-argument constructors of FileReader always use the platform default encoding : The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate.

Try setting the encoding on your StreamResult specifically:

StreamResult result = new StreamResult(new OutputStreamWriter(out, "UTF-8"));

This way, it should only be able to write out in UTF-8.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!