http.get and ISO-8859-1 encoded responses

前端 未结 2 1322
半阙折子戏
半阙折子戏 2021-01-12 13:50

I\'m about to write a RSS-feed fetcher and stuck with some charset problems.

Loading and parsing the feed was quite easy compared to the encoding. I\'m loading the f

2条回答
  •  难免孤独
    2021-01-12 14:09

    I think the issue is probably with the way that you are storing the data before you are passing it to feedparser. It is hard to say without seeing your data event handler, but I'm going to guess that you are doing something like this:

    values = '';
    stream.on('data', function(chunk){
      values += chunk;
    });
    

    Is that right?

    The issue is that in this case, chunk is a buffer, and by using '+' to append them all together, you implicitly convert the buffer to a string.

    Looking into it further, you should really be doing the iconv conversion on the whole feed, before running it through feedparser, because feedparser is likely not aware of other encodings.

    Try something like this:

    var iconv = new Iconv('ISO-8859-1', 'UTF8');
    var chunks = [];
    var totallength = 0;
    stream.on('data', function(chunk) {
      chunks.push(chunk);
      totallength += chunk.length;
    });
    stream.on('end', function() {
      var results = new Buffer(totallength);
      var pos = 0;
      for (var i = 0; i < chunks.length; i++) {
        chunks[i].copy(results, pos);
        pos += chunks[i].length;
      }
      var converted = iconv.convert(results);
      parser.parseString(converted.toString('utf8'));
    });
    

提交回复
热议问题