Set response encoding with HttpClient 3.1

穿精又带淫゛_ 提交于 2019-12-19 09:47:06

问题


I'm using org.apache.commons.httpclient.HttpClient and need to setup response encoding (for some reason server returns incorrect encoding in Content-Type). My way is to get response as raw bytes and convert to String with desired encoding. I'm wondering if there is some better way to do this (eg. setup HttpClient). Thanks for suggestions.


回答1:


I don't think there's a better answer using HttpClient 3.x APIs.

The HTTP 1.1 spec says clearly that a client "must" respect the character set specified in the response header, and use ISO-8859-1 if no character set is specified. The HttpClient APIs are designed on the assumption that the programmer wants to conform to the HTTP specs. Obviously, you need to break the rules in the spec so that you can talk to the non-compliant server. Not withstanding, this is not a use-case that the API designers saw a need to support explicitly.

If you were using the HttpClient 4.x, you could write your own ResponseHandler to convert the body into an HttpEntity, ignoring the response message's notional character set.




回答2:


A few notes:

  1. Server serves data, so it's up to server to serve it in an appropriate format. So response encoding is set by server not client. However, client could suggest to server what format it would like via Accept and Accept-Charset:

    Accept: text/plain
    Accept-Charset: utf-8
    

    However, http servers usually do not convert between formats.

  2. If option 1. does not work, then you should look at the configuration of the server.

  3. When String is sent as raw bytes (and it always is, because this is what networks transmit), there is always the encoding defined. Since server produces this raw bytes, it defines the encoding. So, you can not take raw bytes and use encoding of your choice to create a String. You must use encoding that was used when converted from String to bytes.




回答3:


Disclaimer: I'm not really knowing HttpClient, only reading the API.

I would use the execute method returning a HttpResponse, then .getEntity().getContent(). This is a pure byte stream, so if you want to ignore the encoding told by the server, you can simply wrap your own InputStreamReader around it.


Okay, looks like I had the wrong version (obviously, there are too much HttpClient classes out there).

But same as before, just located on other classes: the HttpMethod has a getResponseBodyAsStream() method, around which you can now wrap your own InputStreamReader. (Or get the whole array at once, if it is not too big, and convert it to String, as you wrote.)

I think trying to change the response and letting the HttpClient analyze it is not the right way here.


I suggest sending a message to the server administrator/webmaster about the wrong charset, though.




回答4:


Greetings folks,

Jus in case someone finds this post googling for setting HttpClient to write in UTF-8.

This line of code should be handy...

response.setContentType("text/html; charset=UTF-8");

Best



来源:https://stackoverflow.com/questions/5142794/set-response-encoding-with-httpclient-3-1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!