Writing emoji to XML file in JAVA

笑着哭i 提交于 2021-01-28 18:49:24

问题


Short question: Given a String str = "😭"; output an XML file containing <tag>😭</tag> instead of <tag>&#128557;</tag>

I am trying to create an XML file in JAVA that may contain normal text or emoji within a tag. The XML file is in UTF-8 encoding, so that when opened up in Notepad++, you can see normal text as well as emoji within a tag. While testing my code, somehow the emoji got translated as &#xxxxxx;.

Sample code:

String str = "😭";
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
Element root = document.createElement("tag");
root.appendChild(document.createTextNode(str));
document.appendChild(root);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(document), new StreamResult(new File("test.xml")));

回答1:


Emojis will be translated to their HTML codes by default, but you can prevent this by embedding an instruction to disable escaping for the output. Here's an example using your code, with just two extra lines needed, to disable escaping, and then enable escaping, by calling the Document method createProcessingInstruction():

package com.unthreading.emojitoxml;

import java.io.File;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.OutputKeys;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;

public class App {

    public static void main(String[] args) throws ParserConfigurationException, TransformerException {

        String str = "😭";
        Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
        Element root = document.createElement("tag");
        document.appendChild(document.createProcessingInstruction(StreamResult.PI_DISABLE_OUTPUT_ESCAPING, "")); // <=== ADD THIS LINE
        root.appendChild(document.createTextNode(str));
        document.appendChild(root);
        document.appendChild(document.createProcessingInstruction(StreamResult.PI_ENABLE_OUTPUT_ESCAPING, "")); // <=== ADD THIS LINE
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.transform(new DOMSource(document), new StreamResult(new File("test.xml")));
    }
}

This is the content of test.xml after running that code:

<?xml version="1.0" encoding="UTF-8" standalone="no"?><tag>😭</tag>

Notes:

  • It doesn't seem to matter what is in the second String parameter passed to document.createProcessingInstruction(). In my example I just pass an empty string.
  • See the answers to the SO question What is the use of static fields PI_ENABLE_OUTPUT_ESCAPING & PI_DISABLE_OUTPUT_ESCAPING and how can we use them? for more information on the desirability of using this approach.


来源:https://stackoverflow.com/questions/59608657/writing-emoji-to-xml-file-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!