Node.js unicode issue with HTTP response body

前端 未结 3 950
轻奢々
轻奢々 2021-01-05 11:21

The response body of HTTP requests using the native \'http\' module, displays question mark characters for unicode chars, instead of their actual value. Here\'s the basic sn

3条回答
  •  半阙折子戏
    2021-01-05 11:34

    Reason maybe that, if we do not specify a "googleKnownAsUTF8OK" user-agent on request header, google would response a html doc with content-type of ISO-8859-1(for old browsers,bots?i dont know), so decode the response buffer by "binary" is correct.

    But, if we decode a buffer encoded in ISO-8859-1 by utf8, then the byte 0xe0(à) implies "form a character by 3bytes in a row", it is a malformed character in our case, so a few unexpected characters(depending on the environment) was displayed.

    We may try "Mozilla/5.0" as value of user-agent. Good luck.

提交回复
热议问题