Encoding error in Python with Chinese characters
问题 I'm a beginner having trouble decoding several dozen CSV file with numbers + (Simplified) Chinese characters to UTF-8 in Python 2.7. I do not know the encoding of the input files so I have tried all the possible encodings I am aware of -- GB18030, UTF-7, UTF-8, UTF-16 & UTF-32 (LE & BE). Also, for good measure, GBK and GB3212, though these should be a subset of GB18030. The UTF ones all stop when they get to the first Chinese characters. The other encodings stop somewhere in the first line