A resilient, actually working CSV implementation for non-ascii?

前端 未结 4 2219
萌比男神i
萌比男神i 2020-12-30 06:30

[Update] Appreciate the answers and input all around, but working code would be most welcome. If you can supply code that can read the sample files you are

4条回答
  •  独厮守ぢ
    2020-12-30 07:03

    What you are asking is impossible. There is no way to write a program in any language that will accept input in an unknown encoding and correctly convert it to Unicode internal representation.

    You have to find a way to tell the application which encoding to use.

    It is possible to recognize many, but not all, encodingshardet but it really depends on what the content of the files is and whether there are enough data points. This is similar to the issue of correctly decoding filenames on network servers. When a file is created on a network server, there is no way to tell the server what encoding is used, so if you have a folder with names in multiple encodings they are guaranteed to look odd to some, if not all, users and different files will seem odd.

    However, don't give up. Try the chardet encoding detector mentioned in this question: https://serverfault.com/questions/82821/how-to-tell-the-language-encoding-of-a-filename-on-linux and if you are lucky, you won't get many failures.

提交回复
热议问题