Node.js unicode issue with HTTP response body

前端未结

关注

 3  950

轻奢々 2021-01-05 11:21

The response body of HTTP requests using the native \'http\' module, displays question mark characters for unicode chars, instead of their actual value. Here\'s the basic sn

3条回答

半阙折子戏 (楼主)

2021-01-05 11:34

Reason maybe that, if we do not specify a "googleKnownAsUTF8OK" user-agent on request header, google would response a html doc with content-type of ISO-8859-1(for old browsers,bots?i dont know), so decode the response buffer by "binary" is correct.

But, if we decode a buffer encoded in ISO-8859-1 by utf8, then the byte 0xe0(à) implies "form a character by 3bytes in a row", it is a malformed character in our case, so a few unexpected characters(depending on the environment) was displayed.

We may try "Mozilla/5.0" as value of user-agent. Good luck.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...