Does the SaxParser detect xml encoding?

﹥>﹥吖頭↗ 提交于 2019-12-25 00:34:16

问题


I have an html file that contains these tags at the top:

<?xml version="1.0" encoding="windows-1252"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xml:lang="fi" lang="fi" xmlns="http://www.w3.org/1999/xhtml">
<head>

An exception is occuring when i try to use a SaxParser to parse the Html file saying that some character at a specified line and column is invalid when i use this code:

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
InputSource is = new InputSource(new FileInputStream(file));  
parser.parse(is, this);

if i specify the encoding with this: is.setEncoding("ISO-8859-1"); , the exception does not occur.

Why do i have to explicitly tell the SaxParser which encoding it should use? can't the SaxParser detect the encoding from the bytestream or the tag in the beginning of the html file?

Also, the docs say:

"If there is no character stream, but there is a byte stream, the parser will use that byte stream, using the encoding specified in the InputSource or else (if no encoding is specified) autodetecting the character encoding using an algorithm such as the one in the XML specification"

But this is not true! Looking further in the java code i see this:

 /*
     * TODO: Let Expat try to guess the encoding instead of defaulting.
     * Unfortunately, I don't know how to tell which encoding Expat picked,
     * so I won't know how to encode "<externalEntity>" below. The solution
     * I think is to fix Expat to not require the "<externalEntity>"
     * workaround.
     */
    this.encoding = encoding == null ? DEFAULT_ENCODING : encoding;
    this.pointer = initialize(
        this.encoding,
        processNamespaces
    );

Is there no algorithm for detecting xml encoding?

来源:https://stackoverflow.com/questions/54711709/does-the-saxparser-detect-xml-encoding

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!