SAX Parser doesn't recognize windows-1255 encoding

孤街醉人 提交于 2019-12-24 10:55:54

问题


I'm working on a rss parser in android (upgrading a parser I found on the internet). From what I know SAX Parser recognize the encoding automatically from the xml tag, but when I try to parse a feed that declare windows-1255 encoding it doesn't parsing it and throws and exception. I tried few things:

  1. final InputSource source = new InputSource(feed);
    Reader isr = new InputStreamReader(feed);
    source.setCharacterStream(isr);
    
  2. I even tried telling him the specific encoding.

    source.setEncoding("Windows-1255");
    
  3. Tried to look at the locator:

    @Override
    public void setDocumentLocator(Locator locator) {
    }
    

And it recognize the encoding as UTF-16.

Please help me solve this annoying problem! Sorry for the mess with code snippets the code button refuse to work for some reason.


回答1:


Chances are the platform itself doesn't know about the "windows-1255" encoding. After all, it's a Windows-based encoding - I wouldn't want to rely on it being available on any other platforms, particularly mobile ones where things are generally cut down to the "must-have" options.




回答2:


You need to set the encoding to the InputStreamReader.

Reader isr = new InputStreamReader(feed, "windows-1255");
final InputSource source = new InputSource(isr);

From javadoc the logic for reading from InputSource goes something like this:

  • Is there a character stream? if there is, use that(This is what happens if you use a Reader like InputStreamReader)

Otherwise:

  • No character stream? Use byte stream. (InputStream)
  • Is there a encoding set for InputSource? Use that
  • There was no encoding set? Try parsing the encoding from the xml file


来源:https://stackoverflow.com/questions/9931024/sax-parser-doesnt-recognize-windows-1255-encoding

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!