A resilient, actually working CSV implementation for non-ascii?

前端未结

关注

 4  2225

萌比男神i 2020-12-30 06:30

[Update] Appreciate the answers and input all around, but working code would be most welcome. If you can supply code that can read the sample files you are

4条回答

天命终不由人 (楼主)

2020-12-30 07:09

You are doing the wrong thing in your code by trying to .encode('utf-8'), you should be decoding it instead. And btw, unicode(bytestr, 'utf-8') == bytestr.decode('utf-8')

But most importantly, WHY are you trying to decode the strings?

Sounds a bit absurd but you can actually work with those CSV without caring whether they are cp1251, cp1252 or utf-8. The beauty of it all is that the regional characters are >0x7F and utf-8 too, uses sequences of >0x7F characters to represent non-ASCII symbols.

Since the separators CSV cares about (be it , or ; or \n) are within ASCII, its work won't be affected by the encoding used (as long as it is one-byte or utf-8!).

Important thing to note is that you should give to Python 2.x csv module files opened in binary mode - that is either 'rb' or 'wb' - because of the peculiar way it was implemented.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...