Re-encode Unicode stream as Ascii ignoring errors

后端 未结 2 721
一生所求
一生所求 2021-01-24 15:12

I\'m trying to take a Unicode file stream, which contains odd characters, and wrap it with a stream reader that will convert it to Ascii, ignoring or replacing all characters th

2条回答
  •  野性不改
    2021-01-24 16:07

    I'm a little late to the party with this, but here's an alternate solution, using codecs.StreamRecoder:

    from codecs import getencoder, getdecoder, getreader, getwriter, StreamRecoder
    
    with io.open(self.csv_path,  'rb') as f:
        csv_ascii_stream = StreamRecoder(f, 
                                         getencoder('ascii'), 
                                         getdecoder(detectedEncoding),
                                         getreader(detectedEncoding), 
                                         getwriter('ascii'), 
                                         errors='ignore')
    
        print(csv_ascii_stream.read())
    

    I guess you may want to use this if you need the flexibility to be able to call read()/readlines()/seek()/tell() etc. on the stream that gets returned. If you just need to iterate over the stream, the generator expression abarnert provided is a bit more concise.

提交回复
热议问题