Module request how to properly retrieve accented characters? � � �

前端 未结 3 427
孤城傲影
孤城傲影 2020-12-01 12:25

I\'m using: Module: Request -- Simplified HTTP request method to scrape a webpage with accented characters á é ó ú ê ã etc.

I\'ve already tried e

相关标签:
3条回答
  • 2020-12-01 12:57

    Since binary is deprecated it seems like a better idea to use iconv and correctly handle the decoding:

    var request = require("request"), iconv  = require('iconv-lite');
    var requestOptions  = { encoding: null, method: "GET", uri: "http://something.com"};
    
    request(requestOptions, function(error, response, body) {
        var utf8String = iconv.decode(new Buffer(body), "ISO-8859-1");
        console.log(utf8String);
    });
    

    The important part is to set the encoding on the HTTP request to be null encoding: null.

    0 讨论(0)
  • 2020-12-01 13:00

    Specify the encoding as utf8 not utf-8. Here are a list of possible encodings for a buffer from the Node.js documentation.

    • ascii - for 7 bit ASCII data only. This encoding method is very fast, and will strip the high bit if set.
    • utf8 - Unicode characters. Many web pages and other document formats use UTF-8.
    • base64 - Base64 string encoding.
    • 'binary - A way of encoding raw binary data into strings by using only the first 8 bits of each character. This encoding method is depreciated and should be avoided in favor of Buffer objects where possible. This encoding will be removed in future versions of Node.
    0 讨论(0)
  • 2020-12-01 13:07

    I were tried and OK (Shift_JIS):

    var concat  = require('concat-stream'),
        Iconv   = require('iconv').Iconv,
        request = require('request');
    
    var conv = new Iconv('Shift_JIS', 'utf8'),
        req  = request('http://www.alc.co.jp/');
    
    req.pipe(conv);
    
    req.on('error', function() {
        console.log('an error occurred');
    });
    
    conv.pipe(concat(function(body) {
        console.log(body.toString());
    }));
    

    https://github.com/request/request/issues/1080#issuecomment-56172161

    0 讨论(0)
提交回复
热议问题