Failed to decode response content using IdHttp

淺唱寂寞╮ 提交于 2020-02-16 05:48:50

问题


I use TIdHttp to fetch web content. The response header indicates the content encoding to be utf8. I want to print content in console as CP936 (simplified chinese), but the actual content is not readable.

Result := TEncoding.Utf8.GetString(ResponseBuffer);

I do the same thing in python (using httplib2) without any problems.

def python_try():
    conn = httplib2.HttpConn()
    respose, content = conn.get(...)
    print content.decode('utf8') # readable in console

UPDATE 1

I debugged the raw response and noticed that the content is gzipped.

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=UTF-8
Transfer-Encoding: chunked
Content-Encoding: gzip
Vary: Accept-Encoding
Date: Mon, 24 Dec 2012 15:27:44 GMT
Connection: Keep-Alive

I tried to assign a IdCompressorZLib instance to IdHttp instance. Unfortunately, the application will crash while decompressing gzipped content. The test address is "http\://www.baidu.com" (encoding=gb2312).


UPDATE 2

I also tried to download a gzipped jquery script file, which contains only ascii chars. This time it works, which means to be a problem of Indy library. If I were not wrong, I should close the question.


回答1:


TIdHTTP handles the gzip decompression for you, if you have a TIdCompressorZLib component assigned to the TIdHTTP.Compressor property. Otherwise, you will have to decompress it manually (TIdHTTP will not send an Accept-Encoding header by default if the Compressor property is not assigned).

As for the UTF-8 encoding, TIdHTTP also handles that for you as well, if you are calling the overloaded version of the TIdHTTP.Get() or TIdHTTP.Post() method that returns a String value instead of fill a TStream object. It will decode the UTF-8 to UTF-16 for you. To convert that to CP936, you can let the RTL do the conversion for you:

type
  Cp936String = type AnsiString(936);
var
  S: Cp936String;
begin
  S := Cp936String(IdHTTP1.Get(...));



回答2:


Do not use any auto detect encoding, it cannot be done reliably. Simply believe the Content-Type header.

Result := TEncoding.Utf8.GetString(ResponseBuffer);

If the Content-Type header is missing or lying, then you need to detect encoding. Although I would not use any algorithm that would misdetect UTF-8 as CP936...



来源:https://stackoverflow.com/questions/14017186/failed-to-decode-response-content-using-idhttp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!