Node.js unicode issue with HTTP response body

前端 未结 3 948
轻奢々
轻奢々 2021-01-05 11:21

The response body of HTTP requests using the native \'http\' module, displays question mark characters for unicode chars, instead of their actual value. Here\'s the basic sn

相关标签:
3条回答
  • 2021-01-05 11:32

    I set response.setEncoding('binary'); and it works. No idea why though.

    Reference: http://groups.google.com/group/nodejs/browse_thread/thread/3bd3935b1f42a5f4?pli=1

    In my case I've got some wrong characters due to windows-1252 charset of an old webpage.

    I just used encode: 'binary' in the request options and it worked!

    0 讨论(0)
  • 2021-01-05 11:34

    Reason maybe that, if we do not specify a "googleKnownAsUTF8OK" user-agent on request header, google would response a html doc with content-type of ISO-8859-1(for old browsers,bots?i dont know), so decode the response buffer by "binary" is correct.

    But, if we decode a buffer encoded in ISO-8859-1 by utf8, then the byte 0xe0(à) implies "form a character by 3bytes in a row", it is a malformed character in our case, so a few unexpected characters(depending on the environment) was displayed.

    We may try "Mozilla/5.0" as value of user-agent. Good luck.

    0 讨论(0)
  • 2021-01-05 11:38

    I set response.setEncoding('binary'); and it works. No idea why though.

    Reference: http://groups.google.com/group/nodejs/browse_thread/thread/3bd3935b1f42a5f4?pli=1

    0 讨论(0)
提交回复
热议问题