android utf-8 file parsing

孤人 提交于 2019-12-11 05:03:36

问题


I have some .xml files that are encoded in UTF-8. But whenever I try to parse them on my tablet (idea pad, lenovo, android 3.1), I get the same error:

org.xml.SAXParseException: Unexpected token (position: TEXT @1:2 in 
java.io.StringReader@40bdaef8).

These are the lines that throw the exception:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource inputSource = new InputSource();
inputSource.setCharacterStream(new StringReader(xmlData));
Document doc = db.parse(inputSource); // This line throws exception

Here is my input:

public String getFromFile(ASerializer aserializer) {
    String filename = aserializer.toLocalResource();
    String data = new String();
    try {
        InputStream stream = _context.getResources().getAssets().open(filename);
        BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
        StringBuilder str = new StringBuilder();
        String line = null;
        while((line = reader.readLine()) != null) {
            str.append(line);
        }
            stream.close();
            data = str.toString();
   }

           catch(Exception e) {
       }
       return data;
    }

XML File:

<Results>
    <Result title="08/07/2011">
        <Field title="Company one" value="030589674"/>
        <Field title="Company two" value="081357852"/>
        <Field title="Company three" value="093587125"/>
        <Field title="Company four" value="095608977"/>
    </Result>
    <Result title="11/07/2011">
        <Field title="Company one" value="030589674"/>
        <Field title="Company two" value="081357852"/>
    </Result>
</Results>

I don't want to convert them to ANSI, so is there any way to make the db.parse() work?


回答1:


At this line:

BufferedReader reader = new BufferedReader(new InputStreamReader(stream));

You're reading from stream using the platform default encoding. That's almost certainly not what you want. You'd need to check the XML for for the actual encoding and the correct way to do that is somewhat complicated.

Luckily, every sane XML parser (including the Java/Android one) can do that on its own. To make the XML parser do that, simply pass in the stream itself instead of trying to read it manually.

InputSource inputSource = new InputSource(stream);



回答2:


You are quite likely using an XML file with a BOM mark (Byte Order Mark).

Either use an API that detects the encoding from the BOM

  • Java : How to determine the correct charset encoding of a stream

Alternatively, preprocess the file so that no BOM is present.




回答3:


Your java string is in an UTF-16 encoding be default. If you can't use InputStream as @Joachim Sauer suggested, then try this:

Document doc = db.parse(new ByteArrayInputStream(xmlData.getBytes())); 


来源:https://stackoverflow.com/questions/7885962/android-utf-8-file-parsing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!