http.get and ISO-8859-1 encoded responses

前端 未结 2 1331
半阙折子戏
半阙折子戏 2021-01-12 13:50

I\'m about to write a RSS-feed fetcher and stuck with some charset problems.

Loading and parsing the feed was quite easy compared to the encoding. I\'m loading the f

2条回答
  •  爱一瞬间的悲伤
    2021-01-12 14:09

    You are probably hitting the same problem described on https://groups.google.com/group/nodejs/browse_thread/thread/b2603afa31aada9c.

    The solution seems to be to set the response encoding to binary before processing the Buffer with Iconv.

    The relevant bit is

    set response.setEncoding('binary') and aggregate the chunks into a buffer before calling Iconv.convert(). Note that encoding=binary means your data callback will receive Buffer objects, not strings.


    Updated: this was my initial response

    Are you sure that the feed you are receiving has been encoded correctly?

    I can see two possible errors:

    1. the feed is being sent with Latin-1-encoded data, but with a Content-Type that states charset=UTF-8.
    2. the feed is being sent with UTF-8-encoded data but the Content-Type header does not state anything, defaulting to ASCII.

    You should check the content of your feed and the sent headers with some utility like Wireshark or cURL.

提交回复
热议问题