Node.js unicode issue with HTTP response body

前端未结

关注

 3  948

The response body of HTTP requests using the native \'http\' module, displays question mark characters for unicode chars, instead of their actual value. Here\'s the basic sn

相关标签:

3条回答

抹茶落季

2021-01-05 11:32

I set response.setEncoding('binary'); and it works. No idea why though.

Reference: http://groups.google.com/group/nodejs/browse_thread/thread/3bd3935b1f42a5f4?pli=1

In my case I've got some wrong characters due to windows-1252 charset of an old webpage.

I just used encode: 'binary' in the request options and it worked!

0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2021-01-05 11:34

Reason maybe that, if we do not specify a "googleKnownAsUTF8OK" user-agent on request header, google would response a html doc with content-type of ISO-8859-1(for old browsers,bots?i dont know), so decode the response buffer by "binary" is correct.

But, if we decode a buffer encoded in ISO-8859-1 by utf8, then the byte 0xe0(à) implies "form a character by 3bytes in a row", it is a malformed character in our case, so a few unexpected characters(depending on the environment) was displayed.

We may try "Mozilla/5.0" as value of user-agent. Good luck.

0 讨论(0)
发布评论:

提交评论
- 加载中...
闹比i

2021-01-05 11:38

I set response.setEncoding('binary'); and it works. No idea why though.

Reference: http://groups.google.com/group/nodejs/browse_thread/thread/3bd3935b1f42a5f4?pli=1

0 讨论(0)
发布评论:

提交评论
- 加载中...