Node.JS scrape encoding?

后端 未结 2 1510
天涯浪人
天涯浪人 2020-12-24 15:10

I\'m fetching this page with with this request library in Node.JS, and parsing the body using cheerio.

Calling $.html() on the parsed response body reve

2条回答
  •  清酒与你
    2020-12-24 16:07

    The page appears to be encoded with iso-8859-1. You'll need to tell request to hand you back an un-encoded buffer by passing encoding: null and use something like node-iconv to convert it.

    If you're writing a generalized crawler, you'll have to figure out how to detect the encoding of each page you encounter to decode it correctly, otherwise the following should work for your case:

    var request = require('request');                                               
    var iconv = require('iconv');                                                   
    
    request.get({                                                                   
      url: 'http://www.relaisentrecote.fr',                                         
      encoding: null,                                                               
    }, function(err, res, body) {                                                   
      var ic = new iconv.Iconv('iso-8859-1', 'utf-8');                              
      var buf = ic.convert(body);                                                   
      var utf8String = buf.toString('utf-8');  
      // .. do something with utf8String ..                                                                             
    });                                                                             
    

提交回复
热议问题