I\'m using: Module: Request -- Simplified HTTP request method to scrape a webpage with accented characters á é ó ú ê ã
etc.
I\'ve already tried e
Since binary is deprecated it seems like a better idea to use iconv and correctly handle the decoding:
var request = require("request"), iconv = require('iconv-lite');
var requestOptions = { encoding: null, method: "GET", uri: "http://something.com"};
request(requestOptions, function(error, response, body) {
var utf8String = iconv.decode(new Buffer(body), "ISO-8859-1");
console.log(utf8String);
});
The important part is to set the encoding on the HTTP request to be null encoding: null
.
Specify the encoding as utf8
not utf-8
. Here are a list of possible encodings for a buffer from the Node.js documentation.
ascii
- for 7 bit ASCII data only. This encoding method is very fast, and will strip the high bit if set.utf8
- Unicode characters. Many web pages and other document formats use UTF-8.base64
- Base64 string encoding.'binary
- A way of encoding raw binary data into strings by using only the first 8 bits of each character. This encoding method is depreciated and should be avoided in favor of Buffer objects where possible. This encoding will be removed in future versions of Node.I were tried and OK (Shift_JIS):
var concat = require('concat-stream'),
Iconv = require('iconv').Iconv,
request = require('request');
var conv = new Iconv('Shift_JIS', 'utf8'),
req = request('http://www.alc.co.jp/');
req.pipe(conv);
req.on('error', function() {
console.log('an error occurred');
});
conv.pipe(concat(function(body) {
console.log(body.toString());
}));
https://github.com/request/request/issues/1080#issuecomment-56172161