A resilient, actually working CSV implementation for non-ascii?

前端 未结 4 2225
萌比男神i
萌比男神i 2020-12-30 06:30

[Update] Appreciate the answers and input all around, but working code would be most welcome. If you can supply code that can read the sample files you are

4条回答
  •  天命终不由人
    2020-12-30 07:09

    You are doing the wrong thing in your code by trying to .encode('utf-8'), you should be decoding it instead. And btw, unicode(bytestr, 'utf-8') == bytestr.decode('utf-8')

    But most importantly, WHY are you trying to decode the strings?

    Sounds a bit absurd but you can actually work with those CSV without caring whether they are cp1251, cp1252 or utf-8. The beauty of it all is that the regional characters are >0x7F and utf-8 too, uses sequences of >0x7F characters to represent non-ASCII symbols.

    Since the separators CSV cares about (be it , or ; or \n) are within ASCII, its work won't be affected by the encoding used (as long as it is one-byte or utf-8!).

    Important thing to note is that you should give to Python 2.x csv module files opened in binary mode - that is either 'rb' or 'wb' - because of the peculiar way it was implemented.

提交回复
热议问题