问题
I have some .xml
files that are encoded in UTF-8
. But whenever I try to parse them on my tablet (idea pad, lenovo, android 3.1), I get the same error:
org.xml.SAXParseException: Unexpected token (position: TEXT @1:2 in
java.io.StringReader@40bdaef8).
These are the lines that throw the exception:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource inputSource = new InputSource();
inputSource.setCharacterStream(new StringReader(xmlData));
Document doc = db.parse(inputSource); // This line throws exception
Here is my input:
public String getFromFile(ASerializer aserializer) {
String filename = aserializer.toLocalResource();
String data = new String();
try {
InputStream stream = _context.getResources().getAssets().open(filename);
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null) {
str.append(line);
}
stream.close();
data = str.toString();
}
catch(Exception e) {
}
return data;
}
XML File:
<Results>
<Result title="08/07/2011">
<Field title="Company one" value="030589674"/>
<Field title="Company two" value="081357852"/>
<Field title="Company three" value="093587125"/>
<Field title="Company four" value="095608977"/>
</Result>
<Result title="11/07/2011">
<Field title="Company one" value="030589674"/>
<Field title="Company two" value="081357852"/>
</Result>
</Results>
I don't want to convert them to ANSI
, so is there any way to make the db.parse()
work?
回答1:
At this line:
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
You're reading from stream
using the platform default encoding. That's almost certainly not what you want. You'd need to check the XML for for the actual encoding and the correct way to do that is somewhat complicated.
Luckily, every sane XML parser (including the Java/Android one) can do that on its own. To make the XML parser do that, simply pass in the stream
itself instead of trying to read it manually.
InputSource inputSource = new InputSource(stream);
回答2:
You are quite likely using an XML file with a BOM mark (Byte Order Mark).
Either use an API that detects the encoding from the BOM
- Java : How to determine the correct charset encoding of a stream
Alternatively, preprocess the file so that no BOM is present.
回答3:
Your java string is in an UTF-16 encoding be default. If you can't use InputStream as @Joachim Sauer suggested, then try this:
Document doc = db.parse(new ByteArrayInputStream(xmlData.getBytes()));
来源:https://stackoverflow.com/questions/7885962/android-utf-8-file-parsing