The response body of HTTP requests using the native \'http\' module, displays question mark characters for unicode chars, instead of their actual value. Here\'s the basic sn
I set response.setEncoding('binary'); and it works. No idea why though.
Reference: http://groups.google.com/group/nodejs/browse_thread/thread/3bd3935b1f42a5f4?pli=1
In my case I've got some wrong characters due to windows-1252 charset of an old webpage.
I just used encode: 'binary' in the request options and it worked!
Reason maybe that, if we do not specify a "googleKnownAsUTF8OK" user-agent on request header, google would response a html doc with content-type of ISO-8859-1(for old browsers,bots?i dont know), so decode the response buffer by "binary" is correct.
But, if we decode a buffer encoded in ISO-8859-1 by utf8, then the byte 0xe0(à) implies "form a character by 3bytes in a row", it is a malformed character in our case, so a few unexpected characters(depending on the environment) was displayed.
We may try "Mozilla/5.0" as value of user-agent. Good luck.
I set response.setEncoding('binary'); and it works. No idea why though.
Reference: http://groups.google.com/group/nodejs/browse_thread/thread/3bd3935b1f42a5f4?pli=1